Cyclodipeptide synthases (cdss) and their use in the synthesis of linear dipeptides

ABSTRACT

Use of CDSs in the synthesis of linear dipeptides, and applications thereof for the in vivo and in vitro synthesis of linear dipeptides, in particular Phe-Leu, Leu-Phe, Phe-Phe, Phe-Tyr, Tyr-Phe, Leu-Leu, Leu-Tyr, Tyr-Leu, Phe-Met, Met-Phe, Leu-Met, Met-Leu, Tyr-Met, Met-Tyr, Met-Met, Tyr-Tyr, Ile-Met, Met-Ile, Leu-Ile, Ile-Leu using the corresponding polynucleotides.

The present invention relates to the use of CDSs in the synthesis oflinear dipeptides (also called hereinafter straight-chain dipeptides),and the applications thereof for the in vivo and in vitro synthesis oflinear dipeptides, in particular Phe-Leu, Leu-Phe, Phe-Phe, Phe-Tyr,Tyr-Phe, Leu-Leu, Leu-Tyr, Tyr-Leu, Phe-Met, Met-Phe, Leu-Met, Met-Leu,Tyr-Met, Met-Tyr, Met-Met, Tyr-Tyr, Ile-Met, Met-Ile, Leu-Ile, Ile-Leuusing the corresponding polynucleotides.

Useful properties have already been demonstrated for some lineardipeptides and their derivatives in various fields such aspharmaceuticals, health-care products, food-supplements, cosmetics andthe like.

For example, the Val-Tyr and Ile-Tyr dipeptides have been shown toinhibit angiotensin-converting enzyme (ACE) activity (Maruyama et al.,J. Jpn. Soc. Food Sci. Technol. 2003, 50, 310-315) and they also have anin vivo antihypertensive effect (Tokunaga et al., J. Jpn. Soc. Food Sci.Technol. 2003, 50, 457-462; Matsui et al., Clin. Exp. Pharmacol.Physiol., 2003, 4, 262-265). Many other dipeptides (e.g. Val-Trp,Val-Phe, Ile-Trp, Ala-Tyr) are also known as ACE inhibitory products(Das and Soffer, J. Biol. Chem., 1975, 250, 6762-6768; Cheung et al., J.Biol. Chem., 1980, 255, 401-407).

Kyotorphin (Tyr-Arg), a neurodipeptide first isolated in the bovinebrain and later found in the brains of many other species includinghumans (Takagi et al., Nature, 1979, 282, 410-412; Shiomi et al.,Neuropharmacology, 1981, 20, 633-638), has also been shown to be abioactive molecule. It possesses various opioid activities, includinganalgesic effects (Bean and Vaught, Eur. J. Pharmacol., 1984, 105,333-337). D-Kyotorphin (i.e. Tyr-D-Arg) or N-methylated kyotorphin (i.e.TyrΨ[CON(Me)]-Arg) analogues exhibit a stronger in vivo analgesic effectthan that of natural kyotorphin, probably due to their better resistanceto peptide degradation (Takagi et al., CMLS, 1982, 38, 1344-1345; Uedaet al., Peptides, 2000, 21, 717-722).

Other examples of useful dipeptides are carnosine (B-Ala-His) andhomocarnosine (γ-aminobutyryl-His) that are found in several humantissues. Their physiological functions are unknown although variouspotential prophylactic or therapeutic applications in diabetic secondarycomplications (e.g. cataracts), atherosclerosis, cancer or inflammatorydiseases have been reported (see Hipkiss, Int. J. Biochem. Cell Biol.,1998, 30, 863-868). Carnosine is presently used as a supplementationnutrient in human health because it is believed to delay senescence andprovoke cellular rejuvenation.

Linear dipeptides are also found in some nutritional supplements,particularly those marketed as sports and fitness products but also intotal parenteral nutrition (TPN) and intravenous nutrition (IVN)products. They are used as delivery forms of amino acids that areunstable and insoluble in water such as glutamine or tyrosine.

Gly-Gln and Ala-Gln are used in TPN (Jiang et al., J. Parenter. EnteralNut., 1993, 17, 134-141) to compensate for glutamine depletion which isa feature of metabolic stress such as trauma, infection, or cancer (Zhouet al., J. Parenter. Enteral Nut., 2003, 27, 241-245).

In the same way, Ala-Tyr, Gly-Tyr and Tyr-Arg are used in IVN forproviding tyrosine amino acid in an easily administrable form (Kee andSmith, Nutrition, 1996, 12, 577-577; Himmelseher et al., J. Parenter.Enteral Nut., 1996, 20, 281-286).

Finally, linear dipeptides are also used in the food industry asflavoring agents as exemplified by the aspartame molecule (Asp-Phe-OMe),which is used as a sugar substitute marketed worldwide. It is oftenprovided as a table condiment and it is commonly used in diet food ordrinks.

Known methods for producing linear dipeptides include chemicalsynthesis, extraction from natural producer organisms and also enzymaticmethods.

Chemical methods can be used to synthesize dipeptide derivatives butthey are considered to be disadvantageous with respect to cost as theyoften necessitate the use of protected and deprotected steps in thelinear dipeptide synthesis. Moreover, they are not environment-friendlymethods as they use large amounts of organic solvents and the like.

Extraction of linear dipeptides from natural prokaryote or eukaryoteproducers can be used but the productivity and yield is generally lowbecause the overall content of a desired dipeptide derivative in naturalproducts is often low and producer organisms can be difficult tomanipulate. Another significant disadvantage is that all potentiallinear dipeptides are generally not present in a single natural (e.g.genetically unaltered) product or organism.

Enzymatic methods, i.e. methods utilizing enzymes either in vivo (e.g.in the culture of microorganisms expressing endogenous or heterologousdipeptide-synthesizing enzymes or microorganism cells isolated from theculture medium) or in vitro (e.g. purified dipeptide-synthesizingenzymes) can be used.

The following methods are already known:

A method utilizing a reverse reaction of protease (Bergmann andFraenkel-Conrat, J. Biol. Chem., 1937, 119, 707-720); however, themethod utilizing a reverse reaction of protease requires theintroduction and removal of protective groups for functional groups ofthe amino acids used as substrates, which causes difficulties in raisingthe efficiency of the peptide-forming reaction and in preventing apeptidolytic reaction.

Methods utilizing thermostable aminoacyl t-RNA synthetase (JapanesePatent Application N^(o) 146539/83, Japanese Patent Application N^(o)209991/83, Japanese Patent Application N^(o) 209992/83 and JapanesePatent Application N^(o) 106298/84); the methods utilizing thermostableaminoacyl t-RNA synthetase have problems in that the expression of thisenzyme and the prevention of side reactions forming unwanted by-productsother than the desired products are difficult to prevent.

A method utilizing reverse reaction of proline iminopeptidase(WO03/010307); the method utilizing proline iminopeptidase requiresamidation of one of the amino acids used as substrates, which againmakes such methods difficult to conduct.

Methods utilizing non-ribosomal peptide synthetase (hereinafter referredto as NRPS) (Doekel and Marahiel, Chem. Biol., 2000, 7, 373-384;Dieckmann et al., FEBS Lett., 2001, 498, 42-45; U.S. Pat. No. 5,795,738and U.S. Pat. No. 5,652,116). The methods utilizing NRPS are inefficientin that the supply of coenzyme 4′-phosphopantetheine is necessary.

There also exists a group of peptide synthetases that have lower enzymemolecular weights than that of NRPS and do not require coenzyme4′-phosphopantetheine; for example, gamma-glutamylcysteine synthetase,glutathione synthetase, D-alanyl-D-alanine (D-Ala-D-Ala) ligase, andpoly-gamma-glutamate synthetase. Most of these enzymes utilize D-aminoacids as substrates or catalyze peptide bond formation at thegamma-carboxyl group. As a result of this, they cannot be used for thesynthesis of dipeptides by peptide bond formation at the alpha-carboxylgroup of L-amino acid.

An example of an enzyme capable of dipeptide synthesis by forming apeptide bond at the alpha-carboxyl group of L-amino acid is bacilysinsynthetase (bacilysin is a dipeptide antibiotic derived from amicroorganism belonging to the genus Bacillus). Bacilysin synthetase isknown to have the activity to synthesize bacilysin[L-alanyl-L-anticapsin (L-Ala-L-anticapsin)] and L-alanyl-L-alanine(L-Ala-L-Ala), but there is no information about its ability tosynthesize other dipeptides (Sakajoh et al., J. Ind. Microbiol.Biotechnol., 1987, 2, 201-208; Yazgan et al., Enzyme Microbial Technol.,2001, 29, 400-406).

As for the bacilysin biosynthetase genes in Bacillus subtilis 168 whoseentire genome has been sequenced (Kunst et al., Nature, 1997, 390,249-256), it is known that the productivity of bacilysin is increased byamplification of bacilysin operons containing ORFs ywfA-F (WO00/03009).

Recently, it has been demonstrated that the ywfE ORF encodes a L-aminoacid ligase responsible for the synthesis of alpha-dipeptides fromL-amino acids substrates. The enzyme was shown to have a broad substratespecificity leading to the formation of a wide variety ofalpha-dipeptides (Tabata et al., J. Bacteriol., 2005, 187, 5195-5202;U.S. Patent Application No 20050287626).

The Inventors have previously reported that AlbC (albC gene product),which has no similarities with NRPS, was responsible for the formationof cyclo(L-Phe-L-Leu) and cyclo(L-Phe-L-Phe) during the biosynthesis ofthe anti-bacterial substance albonoursin (cyclo(deltaPhe-deltaLeu)) inStreptomyces noursei ATCC 11455. The expression of AlbC from S. nourseiin heterologous strain S. lividans TK21 or Escherichia coli led to theproduction of cyclo(L-Phe-L-Leu) and cyclo(L-Phe-L-Phe) that weresecreted in the culture medium (Lautru et al., Chem. Biol., 2002, 9,1355-1364; French Patent 2841260 and WO2004/000879).

More recently, AlbC from S. noursei (SEQ ID NO:1) and its homologue fromS. albulus (99% sequence identity (238 amino acids identical/239 aminoacids) and 100% sequence similarity over 239 residues) were shown to beable to form straight-chain dipeptides from one or more kinds of aminoacids. A Patent Application (U.S. Patent Application No 20050287626) hasbeen filed by Kyowa Hakko Kogyo Co.

The types of linear dipeptides that AlbC can produce has been reportedas being combinations of phenylalanine, leucine and alanine.

The invention relates to a process to create a more diverse set oflinear-chain dipeptides using cyclodipeptide synthases (CDSs), a newfamily of enzymes characterized by the Inventors and defined by thepresence of a specific sequence signature. The Inventors havesurprisingly found that AlbC from S. noursei and S. albulus is just onemember of the CDS family and that the other members of the familyidentified by the Inventors in this application, display far lower, only23-33% sequence identity with AlbC from S. noursei and 41-53% sequencesimilarity over 212-226 residues with AlbC from S. noursei.

The Inventors have also surprisingly found that the diverse members ofthe CDS family retain the required functionality to catalyse thesynthesis of linear dipeptides and also surprisingly that thesedifferent members of the family exhibit a very useful diversity in thespecies of linear dipeptides which they can form, being able to catalysethe formation of linear dipeptides which are not formed by AlbC and thatAlbC produces a far wider range of linear dipeptides than has beenpreviously reported.

The Inventors provide the materials to carry out such a process and inparticular provide the necessary nucleic acid and peptide sequences tocode for the various CDS members they have identified, as well asvectors to genetically alter suitable microorganisms to express theseenzymes.

The Inventors also provide the means to identify further members of thisfamily using a variety of searching strategies, allowing further membersto be isolated and characterized, further increasing the types of lineardipeptides which can be produced according to the current invention.

The invention relates to the use of an isolated, natural or syntheticprotein or an active fragment of such a protein, selected in the groupconsisting of proteins or fragments thereof, having at least 20%identity and no more than 90% identity with SEQ ID NO:1, whichcorresponds to the AlbC protein from S. noursei. This protein or anactive fragment of it has the ability to catalyse the formation of alinear dipeptide of the general formula (i):

R¹-R²  (i)

(wherein R¹ and R², which may be the same or different, each representany amino acid).

An active fragment of the protein is one which displays the ability tocatalyse the formation of a linear dipeptide at statisticallysignificant elevated level to the basal level of production for suchsubstances. In particular an active fragment is considered to need to beat least seven amino acid residues in length to have functionality.

These percentages of sequence identity and sequence similarity definedherein were obtained using the BLAST program (blast2seq, defaultparameters) (Tatutsova and Madden, FEMS Microbiol Lett., 1999, 174,247-250).

Such percentage sequence identity and similarity are derived from a fulllength comparison with SEQ ID NO:1, as shown in FIG. 1 herein;preferably these percentages are derived by calculating them on anoverlap representing a percentage of length of said sequences as shownin FIG. 1.

Preferably the protein or an active fragment thereof has at least 20%and no more than 50% identity with SEQ ID NO:1.

Most preferably the protein or an active fragment thereof has at least20% and no more than 35% identity with SEQ ID NO:1.

Comparison of the 239-amino acid sequence of AlbC, the first CDSdescribed (Lautru et al., Chem. Biol., 2002, 9, 1355-1364), withdatabases led to the identification of seven hypothetical proteins ofunknown function with moderate identity and similarity (FIG. 1). One289-amino acid hypothetical protein that displays 33% identity and 53%similarity with AlbC over 212 residues was encoded by the genome ofseveral organisms belonging to the Mycobacterium tuberculosis complex.This protein is named Rv2275 (SEQ ID NO:2) in Mycobacterium tuberculosisH37Rv (Acc n^(o) NP 216791), MT2335 in M. tuberculosis CDC 1551 (Accn^(o) NP 336805), MRA2294 in M. tuberculosis H137Ra (Acc n^(o)YP001283620), TBFG12300 in M. tuberculosis F11 (Acc n^(o) YP001288233)and Mb2298 in Mycobacterium bovis AF2122/97 (Acc n^(o) NP 855947).Therefore, the protein encoded by several Mycobacteria strains will becalled hereinafter Rv2275 (SEQ ID NO:2). Rv2275 is longer than AlbC andcomprises a 49 amino acid N-terminal part that does not align with AlbC.Another hypothetical protein was found in M. bovis BCG strain Pasteur1173P2. This protein named BCG2292 (Acc n^(o) YP978381 SEQ ID NO:34) isidentical to the Rv2275 (SEQ ID NO:2) protein except that the E atresidue 261 is replaced by A in SEQ ID NO:2.

Database searches also revealed three additional different homologousproteins originating from Bacillus species; two identical 249-amino acidhypothetical proteins named YvmC (hereinafter referred to as YvmC-Blic,SEQ ID NO:4) that present 29% identity and 47% similarity with AlbC over221 residues were found in Bacillus licheniformis ATCC 14580 (Acc n^(o)AAU25020) and Bacillus licheniformis DSM 13 (Acc n^(o) AAU42391); one248-amino acid YvmC (hereinafter referred to as YvmC-Bsub, SEQ ID NO:3)protein with 29% identity and 46% similarity with AlbC over 226 residueswas encoded by Bacillus subtilis subsp. subtilis strain 168 (Acc. n^(o)CAB15512); one 238-amino acid hypothetical protein named RBTH_(—)07362(hereinafter referred to as YvmC-Bthu, SEQ ID NO:5) that displays 26%identity and 45% similarity over 214 residues originated from Bacillusthuringiensis serovar israelensis ATCC 35646 (Acc n^(o) EA057133). Inpair wise comparisons, these three different proteins from Bacillusspecies share higher sequence identity and similarity (61-70% identitiesand 76-81% similarities over 236-247 residues).

Among proteins homologous to AlbC also figured a 234-amino acidhypothetical protein Plu0297 (SEQ ID NO:7) that present 28% identity and49% similarity with AlbC over 224 residues and that was found inPhotorhabdus luminescens subsp. laumondii TTO1 (NP 927658).

Another AlbC homologous protein was encoded by the pSHaeC plasmid ofabout 8 kb harbored by the strain Staphylococcus haemolyticus JCSC1435;the protein named pSHaeC06 (SEQ ID NO:6) is 234-amino acid long anddisplays 20% identity and 44% similarity with AlbC over 220 amino acids(Acc n^(o) YP 254604).

Another hypothetical protein was found homologous to AlbC in the genomeof Corynebacterium jeikeium K411; the 216-amino acid protein namedJk0923 (Acc n^(o) YP 250705, SEQ ID NO:8) presents 23% identity and 41%similarity over 212 residues with AlbC.

In all cases this correspondence occurs when the protein or an activefragment of this is compared to SEQ ID NO:1 using a pair wise comparisonprogram such as BLAST to align these proteins or fragments thereof withSEQ ID NO:1 and allow the determination of where in upon SEQ ID NO:1 theconserved sequences appear.

The amino acid sequence alignment of AlbC with its seven relatedhypothetical proteins showed that only 13 positions are conserved amongall proteins but it highlighted two particularly well-conserved regions,one comprising residues 31 to 37 (AlbC numbering) and the other onecontaining residues 178 to 184 (AlbC numbering) (FIG. 1).

These two regions were respectively used to define two sequencepatterns, H-X-[LVI]-[LVI]-G-[LVI]-S (SEQ ID NO:9) and Y-[LVI]-X-X-E-X-P(SEQ ID NO:10), whose simultaneous presence in a protein when separatedby 120-160 amino acids was scanned for in Uniprot (Nucleic Acids Res.2007 January; 35(Database issue):D193-7.) using PATTINPROT (Combet etal., TIBS, 2000, 25, 147-150).

This search revealed only AlbC and its hereabove mentioned homologues(Rv2275 and BCG2292, YvmC-Bsub, Yvmc-Blic, YvmC-Bthu, Plu0297, pSHaeC06and Jk0923). So, it has been shown that this first sequence signaturecan be used to search and define a new family of proteins related toAlbC; the Inventors have named all these enzymes cyclodipeptidesynthases (CDSs). It has been shown below that the eight proteinsbelonging to this family are able to synthesize diverse lineardipeptides.

In a preferred embodiment of said use, the protein or an active fragmentof it has a first conserved amino acid sequence of the general sequenceSEQ ID NO:9:

H-X-[LVI]-[LVI]-G-[LVI]-S, (SEQ ID NO: 9)wherein H=histidine, X=any amino acid, [LVI]=any one of leucine, valineor isoleucine, G=glycine and S=serine.

In another preferred embodiment of said use, the protein or an activefragment of it has a second conserved amino acid sequence of the generalsequence SEQ ID NO:10:

Y-[LVI]-X-X-E-X-P, (SEQ ID NO: 10)wherein Y=tyrosine, [LVI]=any one of leucine, valine or isoleucine,X=any amino acid, E=glutamic acid and P=proline.

Most preferably the protein or an active fragment of it has both thefirst and the second conserved amino acid sequences.

In another preferred embodiment of said use, the first conserved aminoacid sequence and the second amino acid sequence are separated by atleast 120 amino acid residues and no more than 160 amino acid residues.

Most preferably the first conserved amino acid sequence and the secondamino acid sequence are separated by at least 140 amino acid residuesand no more than 150 amino acid residues.

In another preferred embodiment of said use, the first conserved aminoacid sequence corresponds to residues 31 to 37 of SEQ ID NO:1, in theprotein or an active fragment of this.

In another preferred embodiment of said use, the second conserved aminoacid sequence corresponds to residues 178 to 184 of SEQ ID NO:1 in theprotein or an active fragment of it.

The Inventors have defined a new family of proteins related to AlbC,based on the presence of specified sequence signatures and similaritiesin size, they have now found that unexpectedly all members of the newlyidentified CDS family are also able to synthesize linear dipeptides.

In another preferred embodiment of said use, the protein or an activefragment of it, was isolated from a microorganism belonging to the genusBacillus, Corynebacterium, Mycobacterium, Streptomyces, Photorhabdus orStaphylococcus.

According to a more preferred embodiment of said use, the protein or anactive fragment of it, was isolated from a microorganism selected fromthe list Bacillus licheniformis, Bacillus subtilis subsp. subtilis,Bacillus thuringiensis serovar israelensis, Photorhabdus luminescenssubsp. laumondii, Staphylococcus haemolyticus, Corynebacterium jeikeium,Mycobacterium tuberculosis, Mycobacterium bovis or Mycobacterium bovisBCG.

In another preferred embodiment of said use, the protein or an activefragment of it, is selected from the group consisting of AlbC (SEQ IDNO:1), Rv2275 (SEQ ID NO:2), MT2335 (SEQ ID NO:2), MRA2294 (SEQ IDNO:2), TBFG12300 (SEQ ID NO:2), Mb2298 (SEQ ID NO:2), BCG2292 (SEQ IDNO:34), YvmC-Bsub (SEQ ID NO:3), YvmC-Blic (SEQ ID NO:4), YvmC-Bthu (SEQID NO:5), pSHaeC06 (SEQ ID NO:6), Plu0297 (SEQ ID NO:7), JK0923 (SEQ IDNO:8), AlbC-his (SEQ ID NO:35), Rv2275-his (SEQ ID NO:36), YvmC-Bsub-his(SEQ ID NO:37).

Preferably the dipeptide may be in particular Phe-Leu, Leu-Phe, Phe-Phe,Phe-Tyr, Tyr-Phe, Leu-Leu, Leu-Tyr, Tyr-Leu, Phe-Met, Met-Phe, Leu-Met,Met-Leu, Tyr-Met, Met-Tyr, Met-Met, Tyr-Tyr, Ile-Met, Met-Ile, Ile-Leu.

The present invention also provides the use of an isolated, natural orsynthetic nucleic acid sequence coding for a protein or an activefragment thereof, as specified herein.

The invention further relates to the use of a polynucleotide selectedfrom:

a) a polynucleotide encoding a cyclodipeptide synthase as defined above;

b) a complementary polynucleotide of the polynucleotide a);

c) a polynucleotide which hybridizes to polynucleotide a) or b) understringent conditions, for the synthesis of a linear dipeptide.

Advantageously, said polynucleotide is selected from the groupconsisting of the polynucleotides of sequences SEQ ID NO:11, SEQ IDNO:12, SEQ ID NO:13-16, 20 or 21. The polynucleotides of sequences SEQID NO:11, SEQ ID NO:12, SEQ ID NO:13-16 encode respectively thepolypeptides of sequences SEQ ID NO:1-5 and SEQ ID NO:7, thepolynucleotides SEQ ID NO:20 and 21 encode respectively the polypeptidesof sequences SEQ ID NO:6 and 8; furthermore, the polynucleotidecorresponding to positions 114-861 of SEQ ID NO:17 encodes thepolypeptide AlbC-his of SEQ ID NO:35, the polynucleotide correspondingto positions 114-1008 of SEQ ID NO:18 encodes the polypeptide Rv2275-hisof SEQ ID NO:36 and the polynucleotide corresponding to positions114-885 of SEQ ID NO:19 encodes the polypeptide YvmC-Bsub-his of SEQ IDNO:37.

The term “hybridize(s)” as used herein refers to a process in whichpolynucleotides and/or oligonucleotides hybridize to the recited nucleicacid sequence or parts thereof. Therefore, said nucleic acid sequencemay be useful as probes in Northern or Southern Blot analysis of RNA orDNA preparations, respectively, or can be used as oligonucleotideprimers in PCR analysis dependent on their respective size. Preferably,said hybridizing oligonucleotides comprise at least 10 and morepreferably at least 15 nucleotides. While a hybridizing polynucleotideof the present invention to be used as a probe preferably comprises atleast 100 and more preferably at least 200, or most preferably at least500 nucleotides.

It is well known in the art how to perform hybridization experimentswith nucleic acid molecules, i.e. the person skilled in the art knowswhat hybridization conditions she/he has to use in accordance with thepresent invention. Such hybridization conditions are referred to instandard text books such as Sambrook et al., Molecular Cloning: ALaboratory Manual; Cold Spring Harbor Laboratory Press, 2^(nd) edition1989 and 3^(rd) edition 2001; Gerhardt et al.; Methods for General andMolecular Bacteriology; ASM Press, 1994; Lefkovits; Immunology MethodsManual: The Comprehensive Sourcebook of Techniques; Academic Press,1997; Golèmis; Protein-Protein Interactions: A Molecular Cloning Manual;Cold Spring Harbor Laboratory Press, 2002 and other standard laboratorymanuals known by the person skilled in the Art or as recited above.Preferred in accordance with the present inventions are stringenthybridization conditions.

“Stringent hybridization conditions” refer, e.g. to an overnightincubation at 42° C. in a solution comprising 50% formamide, 5×SSC (750mM NaCl, 75 mM sodium citrate), 50 mM sodium phosphate (pH 7.6),5×Denhardt's solution, 10% dextran sulfate, and 20 μg/ml denatured,sheared salmon sperm DNA, followed e.g. by washing the filters in0.2×SSC at about 65° C.

Also contemplated are nucleic acid molecules that hybridize at lowstringency hybridization conditions. Changes in the stringency ofhybridization and signal detection are primarily accomplished throughthe manipulation of formamide concentration; salt conditions, ortemperature. For example, lower stringency conditions include anovernight incubation at 37° C. in a solution comprising 6×SSPE(20×SSPE=3 mol/l NaCl; 0.2 mol/l NaH₂PO₄; 0.02 mol/l EDTA, pH 7.4), 0.5%SDS, 30% formamide, 100 μg/ml salmon sperm blocking DNA; followed bywashes at 50° C. with 1×SSPE, 0.1% SDS.

In addition, to achieve even lower stringency, washes performedfollowing stringent hybridization can be done at higher saltconcentrations (e.g. 5×SSC). It is of note that variations in the aboveconditions may be accomplished through the inclusion and/or substitutionof alternate blocking reagents used to suppress background inhybridization experiments. Typical blocking reagents include Denhardt'sreagent, BLOTTO, heparin, denatured salmon sperm DNA, and commerciallyavailable proprietary formulations.

The present invention also provides a recombinant vector comprising anucleic acid coding sequence as defined hereabove. This vector isconfigured to introduce the nucleic acid coding sequence into a hostcell and this coding sequence is thereby transcribed and translated bythe endogenous transcription and translation mechanisms of the hostcell.

The recombinant vector may comprise coding sequences for at least twoproteins or active fragments thereof as defined hereabove. By providingmultiple coding sequences the Inventors provide a means of producingseveral enzyme specific linear dipeptides, by including suitable codingsequences from several such CDS enzymes.

Hence, the at least two coding sequences come from different genes.

Alternatively the at least two coding sequences come from a single gene.In such a case the provision of multiple coding sequences for the samegene product allows the amplification of the exogenous gene productlevels so increasing the rate of linear dipeptide formation.

Preferably the host cell is a prokaryote. Prokaryotic cells aregenerally simple to culture and easily stored between rounds offermentation, making them an ideal system in which to produce on a largescale significant levels of linear dipeptide from simple media andgrowing conditions.

Most preferably the host cell is Escherichia coli, the bestcharacterized prokaryotic organism in which a plurality of differentexpression systems and culture technologies exist.

The present invention further relates to a recombinant vector comprisingsaid nucleic acid coding sequence as defined hereabove. This vector isconfigured to express the nucleic acid coding sequence in a cell freeexpression system by the endogenous mechanisms of this cell freeexpression system.

The present invention also provides a method for the production of alinear dipeptide, comprising the steps:

a) culturing upon a medium a host cell which has the ability to producea protein or an active fragment thereof having the activity to form alinear dipeptide from one or more kinds of amino acids;

b) allowing the linear dipeptide to form and accumulate in the host celland in some cases also in the medium;

c) recovering the linear dipeptide from the cellular extract and medium;

wherein the protein or an active fragment thereof is selected in thegroup consisting of proteins and fragments thereof, having at least 20%identity and no more than 90% identity with SEQ ID NO:1.

Preferably the protein or an active fragment thereof is also encoded byan endogenous gene of the host cell.

Alternatively the protein or an active fragment thereof is not encodedby an endogenous gene of said host cell.

The present invention relates also to a method for the production of alinear dipeptide, comprising the steps:

a) inducing a cell free expression system to produce a protein or anactive fragment thereof, having the activity to form a linear dipeptidefrom one or more kinds of amino acids;

b) introducing at least one amino acid substrate to the protein or anactive fragment thereof;

c) allowing the linear dipeptide to form and accumulate;

d) recovering the linear dipeptide;

wherein the protein or an active fragment thereof is selected in thegroup consisting of proteins and fragments thereof, having at least 20%identity and no more than 90% identity with SEQ ID NO:1.

The present invention further provides a method of identifyingpolypeptides that catalyse the formation of a linear dipeptide of thegeneral formula (i):

R¹-R²  (i)

(wherein R¹ and R², which may be the same or different and each mayrepresent any amino acid);

characterised in that it comprises the steps:

a) identifying a candidate polypeptide sequence as having at least oneof the following motifs:

H-X-[LVI]-[LVI]-G-[LVI]-S (SEQ ID NO: 9)wherein H=histidine, X=any amino acid, [LVI]=any one of leucine, valineor isoleucine, G=glycine and S=serine; andwherein at least one of said H, LVI, G or S can be another amino acidnamely H can be replaced by any one of Lysine or Arginine; LVI can bereplaced by any one of Glycine, Alanine, Leucine, Valine or Isoleucine;G can be replaced by any one of Glycine, Alanine, Leucine, Valine orIsoleucine; S can be replaced by Cysteine, Threonine or Methionine.

Y-[LVI]-X-X-E-X-P (SEQ ID NO: 10)wherein Y=tyrosine, [LVI]=any one of leucine, valine or isoleucine,X=any amino acid, E=glutamic acid and P=proline; andwherein at least one of said Y, LVI, E, X or P can be another amino acidnamely Y can be replaced by any one of Phenylalanine or Trytophan; LVIcan be replaced by any one of Glycine, Alanine, Leucine, Valine orIsoleucine; E can be replaced by any one of Aspartic Acid, Asparagine,Glutamine; P can be replaced by any one of Glycine, Alanine, Leucine,Valine or Isoleucine;

b) creating a polypeptide expression construct by linking said candidatepolypeptide coding sequence to promoter sequences configured to expresssaid candidate peptide at an appreciable level;

c) introducing said polypeptide expression construct into at least onecell and inducing the take up of said polypeptide expression constructby said at least one cell or a cell free expression system;

d) monitoring the levels and types of linear dipeptides in the growthmedium of said at least one cell or said cell free expression system;

e) comparing the levels of linear dipeptides in the presence of saidpolypeptide expression construct to the levels of linear dipeptides inthe absence of said polypeptide expression construct to determine therelative level of production of linear dipeptides by said polypeptideexpression construct; and

f) correlating the relative production of linear dipeptides toexpression of said candidate polypeptide in said at least one cell orsaid cell free expression system.

The Inventors therefore provide a systematic approach to theidentification of further enzymes capable of synthesizing lineardipeptides. This approach uses the two conserved motifs which theInventors have identified for the first time and allows theidentification of suitable candidate polypeptides in silico which haveone or both of these domains or derivatives thereof.

These candidate polypeptides are then linked to a suitable promoter,whose properties allow the expression of the candidate polypeptide at alevel where its activity becomes appreciable. The exact level requiredto become appreciable will vary depending upon the exact expressionsystem used and as such specific details are not provided by theInventors as this is a common experimental practice.

According to a preferred embodiment of said method, the said firstconserved motif (SEQ ID NO:9) and the second conserved motif (SEQ IDNO:10) are separated by at least 75 and no more than 250 amino acids.

The identification system for candidate polypeptides may also thereforeencompass candidate molecules in which the first and second conservedmotifs (SEQ ID NO:9 and 10 respectively) where both present areseparated by a variable stretch of 75 and 250 amino acids.

Preferably the first conserved motif (SEQ ID NO:9) and/or the secondconserved motif (SEQ ID NO:10) comprise more than one residue change.

The present invention also provides a method of identifying polypeptidesthat catalyse the formation of a linear dipeptide of the general formula(i):

R¹-R²  (i)

(wherein R¹ and R², which may be the same or different and each mayrepresent any amino acid);

characterized in that it comprises the steps:

a) identifying a candidate polypeptide sequence as having at least 20%identity and no more than 90% identity with SEQ ID NO:1; or having atleast 20% identity with any one of SEQ ID NO:2, SEQ ID NO:3, SEQ IDNO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:35,SEQ ID NO:36, SEQ ID NO:37;

b) creating a polypeptide expression construct by linking the candidatepolypeptide sequence to promoter sequences configured to express saidcandidate peptide at an appreciable level;

c) introducing the polypeptide expression construct into at least onecell or a cell free expression system and inducing the expression of thepolypeptide expression construct by the at least one cell or cell freeexpression system;

d) monitoring the levels and types of linear dipeptides in the cellularextract and growth medium of the at least one cell or the cell freeexpression system;

e) comparing the levels of linear dipeptides in the presence of thepolypeptide expression construct to the levels of linear dipeptides inthe absence of the polypeptide expression construct to determine therelative level of production of linear dipeptides by the polypeptidefusion construct; and

f) correlating the relative production of linear dipeptides to theexpression of the candidate polypeptide in said at least one cell or thecell free expression system.

For a better understanding of the invention and to show how the same maybe carried into effect, there will now be shown by way of example only,specific embodiments, methods and processes according to the presentinvention with reference to the accompanying drawings in which:

FIG. 1 illustrates the amino acid sequence alignment of AlbC (SEQ IDNO:1) from Streptomyces noursei with other CDS proteins. The relatedproteins are Rv2275 (SEQ ID NO:2) from Mycobacterium tuberculosis, YvmCfrom Bacillus subtilis (herein referred to as YvmC-Bsub, SEQ ID NO:3),YvmC from Bacillus licheniformis (herein referred to as YvmC-Blic, SEQID NO:4), YvmC from Bacillus thuringiensis (herein referred to asYvmC-Bthu, SEQ ID NO:5), pSHaeC06 (SEQ ID NO:6) from Staphylococcushaemolyticus, Plu0297 (SEQ ID NO:7) from Photorhabdus luninescens andJk0923 (SEQ ID NO:8) from Corynebacterium jeikeium. The thirteenpositions highly conserved (identical residue in all sequences) areindicated by a black background. Positions with moderate conservationare boxed.

FIG. 2 illustrates EICs of dipeptides m/z values specific to AlbC-his(SEQ ID NO:35) and detected from a LC-MS analysis of the solublefraction of E. coli cells expressing AlbC-his (upper black traces)compared to the same set of EICs from a LCMS analysis of the controlsample (lower grey traces). Each specific EIC peak was labeled asspecified in Table II for identification by MS and MS/MS illustrated inthe FIGS. 3 to 17.

FIG. 3 illustrates the MS and MS/MS spectra of the EIC peak 1 detectedat 20.6 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 4 illustrates the MS and MS/MS spectra of the EIC peak 2 detectedat 22.0 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 5 illustrates the MS and MS/MS spectra of the EIC peak 3 detectedat 22.5 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 6 illustrates the MS and MS/MS spectra of the EIC peak 4 detectedat 22.9 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 7 illustrates the MS and MS/MS spectra of the EIC peak 5 detectedat 23.8 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 8 illustrates the MS and MS/MS spectra of the EIC peak 6 detectedat 25.0 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 9 illustrates the MS and MS/MS spectra of the EIC peak 7 detectedat 25.9 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 10 illustrates the MS and MS/MS spectra of the EIC peak 8 detectedat 26.6 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 11 illustrates the MS and MS/MS spectra of the EIC peak 9 detectedat 27.0 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 12 illustrates the MS and MS/MS spectra of the EIC peak 10 detectedat 27.3 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 13 illustrates the MS and MS/MS spectra of the EIC peak 11 detectedat 29.0 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 14 illustrates the MS and MS/MS spectra of the EIC peak 12 detectedat 29.3 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 15 illustrates the MS and MS/MS spectra of the EIC peak 13 detectedat 30.8 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 16 illustrates the MS and MS/MS spectra of the EIC peak 14 detectedat 31.5 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 17 illustrates the MS and MS/MS spectra of the EIC peak 15 detectedat 33.4 min during the analysis of the soluble fraction of E. coli cellsexpressing AlbC.

FIG. 18 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Met-Met. An EIC peak is detected at 19.4 minutes(FIG. 18 a).

FIG. 19 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Met-Tyr. An EIC peak is detected at 21.6 minutes(FIG. 19 a).

FIG. 20 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Ile-Met. An EIC peak is detected at 21.8 minutes(FIG. 20 a).

FIG. 21 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Tyr-Met. An EIC peak is detected at 22.8 minutes(FIG. 21 a).

FIG. 22 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Leu-Met. An EIC peak is detected at 22.9 minutes(FIG. 22 a).

FIG. 23 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Ile-Tyr. An EIC peak is detected at 23.3 minutes(FIG. 23 a).

FIG. 24 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Tyr-Tyr. An EIC peak is detected at 23.5 minutes(FIG. 24 a).

FIG. 25 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Leu-Tyr. An EIC peak is detected at 23.7 minutes(FIG. 25 a).

FIG. 26 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Met-Ile. An EIC peak is detected at 24.0 minutes(FIG. 26 a).

FIG. 27 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Ile-Ile. An EIC peak is detected at 24.1 minutes(FIG. 27 a).

FIG. 28 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Tyr-Ile. An EIC peak is detected at 24.4 minutes(FIG. 28 a).

FIG. 29 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Met-Leu. An EIC peak is detected at 25.3 minutes(FIG. 29 a).

FIG. 30 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Leu-Ile. An EIC peak is detected at 25.4 minutes(FIG. 30 a).

FIG. 31 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Tyr-Leu. An EIC peak is detected at 25.8 minutes(FIG. 31 a).

FIG. 32 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Ile-Leu. An EIC peak is detected at 26.1 minutes(FIG. 32 a).

FIG. 33 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Phe-Tyr. An EIC peak is detected at 26.7 minutes(FIG. 33 a).

FIG. 34 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Phe-Met. An EIC peak is detected at 27.1 minutes(FIG. 34 a).

FIG. 35 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Leu-Leu. An EIC peak is detected at 27.4 minutes(FIG. 35 a).

FIG. 36 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Phe-Ile. An EIC peak is detected at 28.7 minutes(FIG. 36 a).

FIG. 37 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Tyr-Phe. An EIC peak is detected at 29.0 minutes(FIG. 37 a).

FIG. 38 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Met-Phe. An EIC peak is detected at 29.5 minutes(FIG. 38 a).

FIG. 39 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Ile-Phe. An EIC peak is detected at 30.2 minutes(FIG. 39 a).

FIG. 40 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Phe-Leu. An EIC peak is detected at 30.8 minutes(FIG. 40 a).

FIG. 41 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Leu-Phe. An EIC peak is detected at 31.5 minutes(FIG. 41 a).

FIG. 42 illustrates the EIC and the MS and MS/MS spectra of thechemically-synthesized Phe-Phe. An EIC peak is detected at 33.4 minutes(FIG. 42 a).

FIG. 43 illustrates EICs of dipeptides m/z values specific to Rv2275-his(SEQ ID NO:36) and detected from a LCMS analysis of the soluble fractionof E. coli cells expressing Rv2275-his (upper black traces) compared tothe same set of EICs from a LCMS analysis of the control sample (lowergrey traces).

FIG. 44 illustrates the MS and MS/MS spectra of the EIC peak 1 detectedat 23.3 min during the analysis of the soluble fraction of E. coli cellsexpressing Rv2275-his (SEQ ID NO:36).

FIG. 45 illustrates EICs of dipeptides m/z values specific toYvmC-Bsub-his (SEQ ID NO:37) and detected from a LCMS analysis of thesoluble fraction of E. coli cells expressing YvmC-Bsub-his (SEQ IDNO:37) (upper black traces) compared to the same set of EICs from a LCMSanalysis of the control sample (lower grey traces).

FIG. 46 illustrates the MS and MS/MS spectra of the EIC peak 1 detectedat 20.6 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 47 illustrates the MS and MS/MS spectra of the EIC peak 2 detectedat 21.8 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 48 illustrates the MS and MS/MS spectra of the EIC peak 3 detectedat 22.8 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 49 illustrates the MS and MS/MS spectra of the EIC peak 4 detectedat 24.9 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 50 illustrates the MS and MS/MS spectra of the EIC peak 5 detectedat 25.4 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 51 illustrates the MS and MS/MS spectra of the EIC peak 6 detectedat 25.9 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 52 illustrates the MS and MS/MS spectra of the EIC peak 7 detectedat 26.8 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 53 illustrates the MS and MS/MS spectra of the EIC peak 8 detectedat 27.3 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 54 illustrates the MS and MS/MS spectra of the EIC peak 9 detectedat 29.2 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 55 illustrates the MS and MS/MS spectra of the EIC peak 10 detectedat 30.8 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 56 illustrates the MS and MS/MS spectra of the EIC peak 11 detectedat 31.4 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 57 illustrates the MS and MS/MS spectra of the EIC peak 12 detectedat 33.3 min during the analysis of the soluble fraction of E. coli cellsexpressing YvmC.

FIG. 58 summarizes an exhaustive screening protocol of lineardipeptides.

FIG. 59 shows a part of the alignment of all CDSs sequence and theregion used for design of the first primer is indicated by a line underthe alignment. The numbering is that of AlbC from S. noursei. Thedegenerated amino acid sequence is shown with the correspondingnucleotide sequence. For nucleotide: B=C or G or T, N=A or C or G or T,R=A or G, S=C or G, W=A or T, Y=C or T.

FIG. 60 shows a part of the alignment of all CDSs sequence and theregion used for design of the second primer is indicated by a line underthe alignment. The numbering is that of AlbC from S. noursei. Thedegenerated amino acid sequence is shown with the correspondingnucleotide sequence, and the complementary strand (at the bottom) usedas primer. For nucleotide: D=A or G or T, K=G or T, M=A or C, N=A or Cor G or T, R=A or G, S=C or G, W=A or T, Y=C or T.

There will now be described by way of example a specific modecontemplated by the Inventors. In the following description numerousspecific details are set forth in order to provide a thoroughunderstanding. It will be apparent however, to one skilled in the art,that the present invention may be practiced without limitation to thesespecific details. In other instances, well known methods and structureshave not been described so as not to unnecessarily obscure thedescription.

EXAMPLE 1 Experimental Methods

1) Bioinformatic Tools.

The Basic Local Alignment Search Tool (BLAST) using the program defaultparameters to search for protein homologues (National Center forBiotechnology Information web site; http://www.ncb.nlm.nih.gov/BLAST/).Sequence alignments were performed using Multalin (Corpet, Nucleic AcidsRes., 1988, 16, 10881-10890)(http://prodes.toulouse.inra.fr/multalin/multalin.html) or Clustal W(Thompson J D, Higgins D G, Gibson T J. Nucleic Acids Res. 1994, 22:4673-4680 European Bioinformatics Institute web site;http://www.ebi.ac.uk/clustalw/index.html) with default parameters.

2) Construction of Escherichia coli Expression Vectors Encoding CDSs asC-terminal (His)-6-Tagged Fusions.

The sequences coding for AlbC, Rv2275 and YvmC-Bsub have been clonedinto the E. coli expression vector pQE60 (Qiagen). For this, the codingsequences have been amplified by PCR (25 cycles using standardconditions) with primers designed to add a NcoI site overlapping theinitiation codon and to add a BgIll site at the other end, followingimmediately the last sense codon. The PCR products were first clonedinto the vector pGEMT-Easy vector (Promega) and then the NcoI-BglIIfragment containing the coding sequence was cloned into pQE60 digestedby NcoI and BglII. From the resulting pQE-60 derived plasmid, theprotein is expressed with a 6×His C-terminal extension.

For AlbC, the primers used were 5′-AGAGCCATGGGACTTGCAGGCTTAGTTCCCGC-3′SEQ ID NO:28 (NcoI site underlined) and5′-AGAGAGATCTGGCCGCGTCGGCCAGCTCC-3′ SEQ ID NO:29 (BglII siteunderlined), the template was pSL122 (French Patent FR0207728,PCT/FR03/01851). The pQE60 derivative for AlbC expression was calledpQE60-AlbC (SEQ ID NO:17); the expressed protein AlbC-his having thepeptide sequence of SEQ ID NO:35.

For Rv2275, the primers used were 5′-CGGCCATGGCATACGTGGCTGCCGAACCAGGC-3′SEQ ID NO:30 (NcoI site underlined) and5′-GGCAGATCTTTCGGCGGGGCTCCCATCAGG-3′ SEQ ID NO:31 (BglII siteunderlined), the template was pEXP-Rv2275 (PCT/IB2006/001852). The pQE60derivative for Rv2275 expression was called pQE60—Rv2275 (SEQ ID NO:18);the expressed protein Rv2275-his having the peptide sequence of SEQ IDNO:36.

For YvmC-Bsub from Bacillus subtilis, the primers used were5′-GGCCCATGGCCGGAATGGTAACGGAAAGAAGGTCTG-3′ SEQ ID NO:32 (NcoI siteunderlined) and 5′-GGCAGATCTTCCTTCAGATGTGATCCGTTTCTCAGAAAGC-3′ SEQ IDNO:33 (BglII site underlined), the template was pEXP-YvmC-Bsub(PCT/IB2006/001849). The pQE60 derivative for YvmC-Bsub expression wascalled pQE60-YvmC-Bsub (SEQ ID NO:19); the expressed proteinYvmC-Bsub-his having the peptide sequence of SEQ ID NO:37.

In all the above cases the native AlbC (SEQ ID NO:1), Rv2275 (SEQ IDNO:2) and YvmC-Bsub (SEQ ID NO:3) enzymes are functionallyindistinguishable from the 6×His tag versions of these proteins AlbC-his(SEQ ID NO:35), Rv2275-his (SEQ ID NO:36) and YvmC-Bsub-his (SEQ IDNO:37) respectively expressed in the course of the experiments describedherein. This is due to the fact that neither the modified second residuenor 6×His tag affect the functionality of either conserved portion ofthese enzymes. Also these modifications are not located close to orwithin these two conserved domains.

3) Assay for the In Vivo Formation of Linear Dipeptides by AlbC, Rv2275and YvmC.

Recombinant expression of AlbC (SEQ ID NO:1) from S. noursei, Rv2275(SEQ ID NO:2) from M. tuberculosis and YvmC-Bsub (SEQ ID NO:3) from B.subtilis, respectively as SEQ ID NO:35, SEQ ID NO:36 and SEQ ID NO:37,was achieved in E. coli M15pREP4 cells (Invitrogen) with the plasmidspQE60-AlbC(SEQ ID NO:17), pQE60—Rv2275 (SEQ ID NO:18) andpQE60-YvmC-Bsub (SEQ ID NO:19) respectively. 100 μl of chemicallycompetent cells were transformed with 40 ng plasmid using standardheat-shock procedure (Sambrook et al., Molecular Cloning: A Laboratorymanual, 2001, New York). After 1 h outgrowth at 37° C. with shaking inSOC medium, the 300 μl-reaction mixture was added directly to 5 ml LBmedium containing 100 μg/ml ampicillin. After overnight incubation at37° C. with shaking, this starter culture was used to inoculate 200 mlLB medium containing 100 μg/ml ampicillin. Bacteria were grown at 37° C.until OD₆₀₀˜0.7 and 1 mM IPTG was added. Culture was continued at 20° C.for 18 h. The bacterial cells were harvested by centrifugation (30 min,5,000 g at 4° C.) and suspended in 5 ml ice-cold 9% NaCl solution. Thecells were again harvested by centrifugation (30 min, 5,000 g at 4° C.)and suspended in lysis buffer A (100 mM Tris-HCl pH 8.0, 150 mM NaCl, 5%glycerol). The volume of the added lysis buffer was adjusted to obtain abacterial suspension with an OD₆₀₀˜100. The suspended cells were thenlysed with an Eaton press (Rassant). 5% dimethylsulfoxide (DMSO) wasadded to the lysate just before its centrifugation (30 min, 20,000 g at4° C.). The soluble fraction was saved, acidified with 2% TFA andcentrifuged (30 min, 20,000 g at 4° C.). The resulting soluble fractionwas saved for further analysis by LC-MS/MS (see below).

As a control experiment, the whole process (from cell transformation toanalysis of the linear dipeptide content) was applied to bacteriatransformed by pQE60 (Qiagen), an ampicillin resistance gene-carryingvector that does not express CDS.

4. Samples Analysis by Chromatography Coupled On-Line to MassSpectrometry.

Liquid Chromatography (LC) separation was carried out on a C18analytical column (4.6×150 mm, 3 μm, 100 Å, Atlantis, Waters) at a flowrate of 600 μl/min with a 50 min linear gradient from 0 to 45%acetonitrile/MilliQ water with 0.1% formic acid after a 5 min step inthe initial condition for column equilibration and sample desalting.Elution from the LC column was split into two flows: one at 550 μl/mindirected to a diode array detector and the remaining flow directed toelectrospray mass spectrometer for MS and MS/MS analyses. The massspectrometer is an ion trap mass spectrometer Esquire HCT equipped withan orthogonal Atmospheric Pressure Interface-ElectroSpray Ionization(AP-ESI) source (Bruker Daltonik GmbH, Germany).

In this online coupling system, LC-eluted sample was continuouslyinfused into the ESI probe at a flow rate of 50 μl/min. Nitrogen servedas the drying and nebulizing gas while helium gas was introduced intothe ion trap for efficient trapping and cooling of the ions generated bythe ESI as well as for fragmentation processes. Ionization was carriedout in positive mode with a nebulizing gas set at 35 psi, a drying gasset at 8 μl/min and a drying temperature set at 340° C. for optimalspray and desolvatation. Ionization and mass analyses conditions(capillary high voltage, skimmer and capillary exit voltages and ionstransfer parameters) were tuned for an optimal detection of compoundsover the range m/z 100 to 400. For structural characterization by massfragmentations, an isolation width of 1 mass unit was used for isolatingthe parent ion. A fragmentation energy ramp was used for automaticallyvarying the fragmentation amplitude in order to optimize the MS/MSfragmentation process. Full scan MS and MS/MS spectra were acquiredusing EsquireControl software and all data were processed usingDataAnalysis software.

5) Chemical Synthesis of Linear Dipeptides.

Ile-Leu, Ile-Ile, Ile-Phe, Ile-Met, Phe-Ile, Leu-Met, Leu-Ile, Met-Ileand Tyr-Met were synthesized on an Applied Biosystems apparatus byconventional Fmoc/tBu strategy according to the user manual suppliedwith the apparatus (Applied Biosystems 433A User Manual Vol. 1, Chapter3). Purification to homogeneity and physico-chemical characterization oflinear peptides was achieved by RP-HPLC and mass spectrometryrespectively. All other linear dipeptides were purchased from Sigma andBachem.

6) Strategy Used for Detection and Identification of Linear Dipeptides.

The search for linear dipeptides was done according to an exhaustivescreening protocol summarized in FIG. 58. All samples were analyzed byLC-MS/MS. From the LC-MS/MS data file, ion chromatograms correspondingto the 108 different m/z values associated with the 210 potential lineardipeptides (see Table I) were extracted. A set of extracted ionchromatograms (EICs) was then obtained for each CDS-containing samplesas well as for control samples. For each m/z value, comparison of EICsobtained from CDS-containing sample and control sample enabled thedetection of EIC peaks specific to CDS activity. These specific peakswere further characterized by MS/MS fragmentation for structuralelucidation. Analysis of the daughter ions spectra enabled first toidentify peaks corresponding to linear dipeptides. Indeed, lineardipeptides possess a specific fragmentation signature characterized by acombination of neutral losses of 17, 18, 28 and/or 46 (corresponding tofragmentations of the functional groups of peptides and fragmentationsof the amide bond as previously proposed (Roepstorff et al., Biomed.Mass Spectrom., 1984, 11, 601; Johnson et al., Anal. Chem., 1987, 59,2621-2625). Second, the analysis enabled to identify the two amino acidscontained in the linear dipeptide either by the detection of immoniumions which are characteristic of amino acid side chains or by theneutral losses corresponding to the departure of amino acid residuesconstituting the linear dipeptide. The final identification of a lineardipeptide in a sample was obtained by confirming the similarity of bothits retention time in LC and especially its fragmentation pattern inMS/MS with those of reference dipeptides (commercial or home-madesynthetic dipeptides).

TABLE I Calculated monoisotopic mass (m/z) values of natural dipeptidesunder positive mode of ESI-MS. AA Gly Ala Ser Pro Val Thr Cys Ile LeuAsn residue 57.05 71.08 87.08 97.12 99.13 101.1 103.1 113.2 113.2 114.1Gly 133.0 147.1 163.1 173.1 175.1 177.1 179.0 189.1 189.1 190.1 Ala161.1 177.1 187.1 189.1 191.1 193.0 203.1 203.1 204.1 Ser 193.1 203.1205.1 207.1 209.0 219.1 219.1 220.1 Pro 213.1 215.1 217.1 219.1 229.1229.1 230.1 Val 217.1 219.1 221.1 231.2 231.2 232.1 Thr 221.1 223.1233.1 233.1 234.1 Cys 225.0 235.1 235.1 236.1 Ile 245.2 245.2 246.1 Leu245.2 246.1 Asn 247.1 Asp Gln Lys Glu Met His Phe Arg Tyr Trp AA Asp GlnLys Glu Met His Phe Arg Tyr Trp residue 115.1 128.1 128.2 129.1 131.2137.1 147.2 156.2 163.2 186.2 Gly 191.0 204.1 204.1 205.1 207.1 213.1223.1 232.1 239.1 262.1 Ala 205.1 218.1 218.1 219.1 221.1 227.1 237.1246.1 253.1 276.1 Ser 221.1 234.1 234.1 235.1 237.1 243.1 253.1 262.1269.1 292.1 Pro 231.1 244.1 244.1 245.1 247.1 253.1 263.1 272.2 279.1302.1 Val 233.1 246.1 246.2 247.1 249.1 255.1 265.1 274.2 281.1 304.1Thr 235.1 248.1 248.1 249.1 251.1 257.1 267.1 276.1 283.1 306.1 Cys237.0 250.1 250.1 251.1 253.0 259.1 269.1 278.1 285.1 308.1 Ile 247.1260.1 260.2 261.1 263.1 269.1 279.2 288.2 295.1 318.2 Leu 247.1 260.1260.2 261.1 263.1 269.1 279.2 288.2 295.1 318.2 Asn 248.1 261.1 261.1262.1 264.1 270.1 280.1 289.1 296.1 319.1 Asp 249.1 262.1 262.1 263.1265.1 271.1 281.1 290.1 297.1 320.1 Gln 275.1 275.2 276.1 278.1 284.1294.1 303.2 310.1 333.1 Lys 275.2 276.1 278.1 284.2 294.2 303.2 310.2333.2 Glu 277.1 279.1 285.1 295.1 304.1 311.1 334.1 Met 281.1 287.1297.1 306.1 313.1 336.1 His 293.1 303.1 312.2 319.1 342.1 Phe 313.1322.2 329.1 352.

Arg 331.2 338.2 361.

Tyr 345.1 368.

Trp 391.

indicates data missing or illegible when filed

EXAMPLE 2 The In Vivo Synthesis of Linear Dipeptides by CDSs

Synthesis of linear dipeptides by CDSs was assessed by searching forlinear dipeptides in soluble extracts obtained from bacteria expressingrespectively AlbC, Rv2275 and YvmC-Bsub, in each case these enzymes wereexpressed with a C-terminal 6-his tag, also the second residue wasmodified due the introduction of the NcoI restriction enzyme targetsequence into these sequences to allow cloning into the pQE60 vector aspreviously described (see Experimental Methods). The actual peptidesequence of each enzyme expressed being AlbC-his SEQ ID NO:35,Rv2275-his SEQ ID NO:36 and YvmC-Bsub-his SEQ ID NO:37. These extractswere performed as previously described (see Experimental Methods) and,in each case, the production of a protein whose molecular weight andN-terminal sequence corresponded to those expected was observed. At thesame time, a soluble extract obtained from bacteria expressing no CDS(pQE60) was also prepared. Finally, all these samples were analyzed byLC-MS/MS and screened for linear dipeptides as depicted in FIG. 58. As amethod control, the soluble fraction of E. coli cells expressingAlbC-his (SEQ ID NO:35) was first analyzed.

1) Additional Linear Dipeptides Produced in the Presence of AlbC.

The soluble fraction of E. coli cells expressing AlbC-his (SEQ ID NO:35)was analyzed by LC-MS/MS leading to a first set of EICs. The sameanalysis was performed with the soluble fraction of E. coli cells notexpressing AlbC-his (SEQ ID NO:35) leading to a second set of EICs.Comparison of the two sets of EICs for each m/z value enabled thedetection of EIC peaks specific to the AlbC activity. Each EIC peak wascharacterized by MS/MS fragmentation and the analysis of the daughterions spectra indicated that 15 peaks (shown in FIG. 2) matched withlinear dipeptides (see summary shown as Table II).

The mass characteristics of each of the 15 EIC peaks, in particular thedetection of immonium ions, led to the unambiguous identification of theamino acids constituting 8 different dipeptides corresponding to peak 1,peak 2, peak 3, peak 8, peak 9, peak 11, peak 12, and peak 15 (TableII). The nature of the amino acids constituting the other dipeptides,corresponding to peak 4, peak 5, peak 6, peak 7, peak 10, peak 13 andpeak 14, remained to be confirmed because they all contain leucyl orisoleucyl residues (see Table II) that have identical immonium ion m/zof 86.5. The identification of the nature and also the sequence of alldetected linear dipeptides was definitely achieved by comparing theirretention times in LC and also their fragmentation patterns inMS/MS—i.e. number of fragments ions, m/z values, and intensities of thegenerated fragments ions—(see Table II and figures numbered herein) tothose of reference chemically-synthesized dipeptides (see Table III andfigures numbered herein). Due to LC column ageing, the retention timesof 3 detected linear dipeptides were shifted compared to those ofcorresponding reference dipeptides—namely Met-Met, Tyr-Met andMet-Tyr—but the elution order was the same for detected and referencedipeptides. Taken together all these data established clearly that AlbCexpression in E. coli cells is responsible for the in vivo formation ofLeu-Phe and Phe-Leu as previously reported (U.S. Pat. U.S. N^(o)20050287626) and also Phe-Phe, Phe-Tyr, Tyr-Phe, Leu-Leu, Leu-Tyr,Tyr-Leu, Phe-Met, Met-Phe, Leu-Met, Met-Leu, Met-Met, Tyr-Met andMet-Tyr (see Tables II & III).

TABLE II LC-MS/MS analysis of the soluble fraction of E. coli cellsexpressing AlbC: summary of data extracted from figures whose numbersare reported herein and identification of linear dipeptides. MS andMS/MS data See EIC Immonium Figures LC Data Identified Peaks^(a) m/zions detected (n^(o)) Tr (min)^(b) dipeptides^(c) 1 281.0 iMet 3 20.6Met-Met 2 313.1 iTyr, iMet 4 22.0 Met-Tyr 3 313.1 iTyr, iMet 5 22.5Tyr-Met 4 263.0 iMet, iLeu or iIle 6 22.9 Leu-Met 5 295.1 iTyr, iLeu oriIle 7 23.8 Leu-Tyr 6 263.0 iMet, iLeu or iIle 8 25.0 Met-Leu 7 295.1iTyr, iLeu or iIle 9 25.9 Tyr-Leu 8 329.1 iPhe, iTyr 10 26.6 Phe-Tyr 9297.1 iMet, iPhe 11 27.0 Phe-Met 10 245.1 iLeu or iIle 12 27.3 Leu-Leu11 329.1 iPhe, iTyr 13 29.0 Tyr-Phe 12 297.1 iMet, iPhe 14 29.3 Met-Phe13 279.1 iPhe, iLeu or iIle 15 30.8 Phe-Leu 14 279.1 iPhe, iLeu or iIle16 31.5 Leu-Phe 15 313.1 iPhe 17 33.4 Phe-Phe ^(a)EIC peaks are listedby increasing retention times according to FIG. 2. ^(b)Tr is theabbreviation for retention time. ^(c)linear dipeptides were definitelyidentified by comparing their retention times, their m/z values andtheir fragmentation patterns with those of reference dipeptides (seeTable III).

With reference to FIG. 3 illustrates the MS and MS/MS spectra of the EICpeak 1 detected at 20.6 min during the analysis of the soluble fractionof E. coli cells expressing AlbC. The MS spectrum shows a main m/z peakat 281.0±0.1 (FIG. 3 a). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 3 b). Encircled m/z peak at 104.3±0.1 matches to immonium ion ofMet, respectively referred to as iMet.

With reference to FIG. 4 illustrates the MS and MS/MS spectra of the EICpeak 2 detected at 22.0 min during the analysis of the soluble fractionof E. coli cells expressing AlbC. The MS spectrum shows a m/z peak at313.1±0.1 (FIG. 4 a). This peak was isolated as parent ion and subjectedto MS/MS fragmentation giving rise to a daughter ions spectrum (FIG. 4b). Encircled m/z peak at 136.0±0.1 matches to immonium ion of Tyr,respectively referred to as iTyr and encircled m/z peak at 104.2±0.1matches to immonium ion of Met referred to as iMet.

With reference to FIG. 5 illustrates the MS and MS/MS spectra of the EICpeak 3 detected at 22.5 min during the analysis of the soluble fractionof E. coli cells expressing AlbC. The MS spectrum shows a m/z peak at313.1±0.1 (FIG. 5 a). This peak was isolated as parent ion and subjectedto MS/MS fragmentation giving rise to a daughter ions spectrum (FIG. 5b). Encircled m/z peak at 136.1±0.1 matches to immonium ion of Tyr,respectively referred to as iTyr and encircled m/z peak at 104.3±0.1matches to immonium ion of Met referred to as iMet.

With reference to FIG. 6 illustrates the MS and MS/MS spectra of the EICpeak 4 detected at 22.9 min during the analysis of the soluble fractionof E. coli cells expressing AlbC. The MS spectrum shows a main m/z peakat 263.0±0.1 (FIG. 6 a). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 6 b). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofLeu or Ile, respectively referred to as iLeu or iIle and encircled m/zpeak at 104.3±0.1 matches to immonium ion of Met referred to as iMet.

With reference to FIG. 7 illustrates the MS and MS/MS spectra of the EICpeak 5 detected at 23.8 min during the analysis of the soluble fractionof E. coli cells expressing AlbC. The MS spectrum shows a minor m/z peakat 295.1±0.1 not detected in the control sample (FIG. 7 a). This peakwas isolated as parent ion and subjected to MS/MS fragmentation givingrise to a daughter ions spectrum (FIG. 7 b). Encircled m/z peak at136.0±0.1 matches to immonium ion of Tyr referred to as iTyr andencircled m/z peak at 86.6±0.1 matches to immonium ion of Leu or Ile,respectively referred to as iLeu or iIle.

With reference to FIG. 8 illustrates the MS and MS/MS spectra of the EICpeak 6 detected at 25.0 min during the analysis of the soluble fractionof E. coli cells expressing AlbC. The MS spectrum shows a main m/z peakat 263.0±0.1 (FIG. 8 a). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 8 b). Encircled m/z peak at 104.2±0.1 matches to immonium ion ofMet referred to as iMet and encircled m/z peak at 86.5±0.1 matches toimmonium ion of Leu or Ile, respectively referred to as iLeu or iIle.

With reference to FIG. 9 illustrates the MS and MS/MS spectra of the EICpeak 7 detected at 25.9 min during the analysis of the soluble fractionof E. coli cells expressing AlbC. The MS spectrum shows a m/z peak at295.1±0.1 (FIG. 9 a). This peak was isolated as parent ion and subjectedto MS/MS fragmentation giving rise to a daughter ions spectrum (FIG. 9b). Encircled m/z peak at 136.1±0.1 matches to immonium ion of Tyrreferred to as iTyr and encircled m/z peak at 86.5±0.1 matches toimmonium ion of Leu or Ile, respectively referred to as iLeu or iIle.

With reference to FIG. 10 illustrates the MS and MS/MS spectra of theEIC peak 8 detected at 26.6 min during the analysis of the solublefraction of E. coli cells expressing AlbC. The MS spectrum shows a minorm/z peak at 329.1±0.1 not detected in the control sample (FIG. 10 a).This peak was isolated as parent ion and subjected to MS/MSfragmentation giving rise to a daughter ions spectrum (FIG. 10 b).Encircled m/z peak at 120.2±0.1 matches to immonium ion of Phe referredto as iPhe and encircled m/z peak at 136.2±0.1 matches to immonium ionof Tyr referred to as iTyr.

With reference to FIG. 11 illustrates the MS and MS/MS spectra of theEIC peak 9 detected at 27.0 min during the analysis of the solublefraction of E. coli cells expressing AlbC. The MS spectrum shows a m/zpeak at 297.1±0.1 (FIG. 11 a). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 11 b). Encircled m/z peak at 104.3±0.1 matches to immonium ion ofMet referred to as iMet and encircled m/z peak at 120.1±0.1 matches toimmonium ion of Phe referred to as iPhe.

With reference to FIG. 12 illustrates the MS and MS/MS spectra of theEIC peak 10 detected at 27.3 min during the analysis of the solublefraction of E. coli cells expressing AlbC. The MS spectrum shows a mainm/z peak at 245.1±0.1 (FIG. 12 a). This peak was isolated as parent ionand subjected to MS/MS fragmentation giving rise to a daughter ionsspectrum (FIG. 12 b). Encircled m/z peak at 86.5±0.1 matches to immoniumion of Leu or Ile, respectively referred to as iLeu or iIle.

With reference to FIG. 13 illustrates the MS and MS/MS spectra of theEIC peak 11 detected at 29.0 min during the analysis of the solublefraction of E. coli cells expressing AlbC. The MS spectrum shows a m/zpeak at 329.1±0.1 not detected in the control sample (FIG. 13 a). Thispeak was isolated as parent ion and subjected to MS/MS fragmentationgiving rise to a daughter ions spectrum (FIG. 13 b). Encircled m/z peakat 136.1±0.1 matches to immonium ion of Tyr referred to as iTyr andencircled m/z peak at 120.1±0.1 matches to immonium ion of Phe referredto as iPhe.

With reference to FIG. 14 illustrates the MS and MS/MS spectra of theEIC peak 12 detected at 29.3 min during the analysis of the solublefraction of E. coli cells expressing AlbC. The MS spectrum shows a m/zpeak at 297.1±0.1 not detected in the control sample (FIG. 14 a). Thispeak was isolated as parent ion and subjected to MS/MS fragmentationgiving rise to a daughter ions spectrum (FIG. 14 b). Encircled m/z peakat 120.1±0.1 matches to immonium ion of Phe referred to as iPhe andencircled m/z peak at 104.2±0.1 matches to immonium ion of Met referredto as iMet.

With reference to FIG. 15 illustrates the MS and MS/MS spectra of theEIC peak 13 detected at 30.8 min during the analysis of the solublefraction of E. coli cells expressing AlbC. The MS spectrum shows a mainm/z peak at 279.1±0.1 (FIG. 15 a). This peak was isolated as parent ionand subjected to MS/MS fragmentation giving rise to a daughter ionsspectrum (FIG. 15 b). Encircled m/z peak at 120.1±0.1 matches toimmonium ion of Phe referred to as iPhe and encircled m/z peak at86.5±0.1 matches to immonium ion of Leu or Ile, respectively referred toas iLeu or iIle.

With reference to FIG. 16 illustrates the MS and MS/MS spectra of theEIC peak 14 detected at 31.5 min during the analysis of the solublefraction of E. coli cells expressing AlbC. The MS spectrum shows a mainm/z peak at 279.1±0.1 (FIG. 16 a). This peak was isolated as parent ionand subjected to MS/MS fragmentation giving rise to a daughter ionsspectrum (FIG. 16 b). Encircled m/z peak at 86.5±0.1 matches to immoniumion of Leu or Ile, respectively referred to as iLeu or iIle andencircled m/z peak at 120.2±0.1 matches to immonium ion of Phe referredto as iPhe.

With reference to FIG. 17 illustrates the MS and MS/MS spectra of theEIC peak 15 detected at 33.4 min during the analysis of the solublefraction of E. coli cells expressing AlbC. The MS spectrum shows a minorm/z peak at 313.1±0.1 not detected in the control sample (FIG. 17 a).This peak was isolated as parent ion and subjected to MS/MSfragmentation giving rise to a daughter ions spectrum (FIG. 17 b).Encircled m/z peak at 120.2±0.1 matches to immonium ion of Phe referredto as iPhe.

TABLE III LC-MS/MS analysis reference of chemically- synthesizeddipeptides: summary of data extracted from figures whose numbers arereported herein. MS and MS/MS data Linear Immonium See FIGS. LC Datadipeptides^(a) m/z ions detected (n^(o)) Tr (min)^(b) Met-Met 281.0 iMet18 19.4 Met-Tyr 313.1 iMet, iTyr 19 21.6 Ile-Met 263.0 iMet, iIle 2021.8 Tyr-Met 313.1 iMet, iTyr 21 22.8 Leu-Met 263.0 iLeu, iMet 22 22.9Ile-Tyr 295.1 iIle, iTyr 23 23.3 Tyr-Tyr 345.1 iTyr 24 23.5 Leu-Tyr295.1 iLeu, iTyr 25 23.7 Met-Ile 263.0 iMet, iIle 26 24.0 Ile-Ile 245.1iIle, iIle 27 24.1 Tyr-Ile 295.1 iTyr, iIle 28 24.4 Met-Leu 263.1 iMet,iLeu 29 25.3 Leu-Ile 245.1 iLeu, iIle 30 25.4 Tyr-Leu 295.1 iTyr, iLeu31 25.8 Ile-Leu 245.1 iLeu, iIle 32 26.1 Phe-Tyr 329.1 iPhe, iTyr 3326.7 Phe-Met 297.1 iPhe, iMet 34 27.1 Leu-Leu 245.1 iLeu 35 27.4 Phe-Ile279.1 iPhe, iIle 36 28.7 Tyr-Phe 329.1 iTyr, iPhe 37 29.0 Met-Phe 297.0iMet, iPhe 38 29.5 Ile-Phe 279.1 iIle, iPhe 39 30.2 Phe-Leu 279.1 iPhe,iLeu 40 30.8 Leu-Phe 279.1 iLeu, iPhe 41 31.5 Phe-Phe 313.1 iPhe 42 33.4^(a)Linear dipeptides are listed by increasing retention times. ^(b)Tris the abbreviation for retention time.

With reference to FIG. 18 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Met-Met. An EIC peak is detectedat 19.4 minutes (FIG. 18 a). The MS spectrum shows a m/z peak at281.0±0.1 (FIG. 18 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 18 c). Encircled m/z peak at 104.2±0.1 matches to immonium ion ofMet referred to as iMet.

With reference to FIG. 19 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Met-Tyr. An EIC peak is detectedat 21.6 minutes (FIG. 19 a). The MS spectrum shows a m/z peak at313.1±0.1 (FIG. 19 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 19 c). Encircled m/z peak at 136.0±0.1 matches to immonium ion ofTyr referred to as iTyr and encircled m/z peak at 104.2±0.1 matches toimmonium ion of Met referred to as iMet.

With reference to FIG. 20 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Ile-Met. An EIC peak is detectedat 21.8 minutes (FIG. 20 a). The MS spectrum shows a m/z peak at263.0±0.1 (FIG. 20 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 20 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofIle referred to as iIle and encircled m/z peak at 104.3±0.1 matches toimmonium ion of Met referred to as iMet.

With reference to FIG. 21 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Tyr-Met. An EIC peak is detectedat 22.8 minutes (FIG. 21 a). The MS spectrum shows a m/z peak at313.1±0.1 (FIG. 21 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 21 c). Encircled m/z peak at 136.0±0.1 matches to immonium ion ofTyr referred to as iTyr and encircled m/z peak at 104.2±0.1 matches toimmonium ion of Met referred to as iMet.

With reference to FIG. 22 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Leu-Met. An EIC peak is detectedat 22.9 minutes (FIG. 22 a). The MS spectrum shows a m/z peak at263.0±0.1 (FIG. 22 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 22 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofLeu referred to as iLeu and encircled m/z peak at 104.3±0.1 matches toimmonium ion of Met referred to as iMet.

With reference to FIG. 23 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Ile-Tyr. An EIC peak is detectedat 23.3 minutes (FIG. 23 a). The MS spectrum shows a m/z peak at295.1±0.1 (FIG. 23 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 23 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofIle, referred to as iIle and encircled m/z peak at 136.1±0.1 matches toimmonium ion of Tyr referred to as iTyr.

With reference to FIG. 24 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Tyr-Tyr. An EIC peak is detectedat 23.5 minutes (FIG. 24 a). The MS spectrum shows a m/z peak at345.1±0.1 (FIG. 24 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 24 c). Encircled m/z peak at 136.1±0.1 matches to immonium ion ofTyr referred to as iTyr.

With reference to FIG. 25 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Leu-Tyr. An EIC peak is detectedat 23.7 minutes (FIG. 25 a). The MS spectrum shows a m/z peak at295.1±0.1 (FIG. 25 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 25 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofLeu, referred to as iLeu and encircled m/z peak at 136.1±0.1 matches toimmonium ion of Tyr referred to as iTyr.

With reference to FIG. 26 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Met-Ile. An EIC peak is detectedat 24.0 minutes (FIG. 26 a). The MS spectrum shows a m/z peak at263.0±0.1 (FIG. 26 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 26 c). Encircled m/z peak at 104.2±0.1 matches to immonium ion ofMet, referred to as iMet and encircled m/z peak at 86.5±0.1 matches toimmonium ion of Ile referred to as iIle.

With reference to FIG. 27 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Ile-Ile. An EIC peak is detectedat 24.1 minutes (FIG. 27 a). The MS spectrum shows a m/z peak at245.1±0.1 (FIG. 27 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 27 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofIle referred to as iIle.

With reference to FIG. 28 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Tyr-Ile. An EIC peak is detectedat 24.4 minutes (FIG. 28 a). The MS spectrum shows a m/z peak at295.1±0.1 (FIG. 28 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 28 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofIle, referred to as iIle and encircled m/z peak at 136.1±0.1 matches toimmonium ion of Tyr referred to as iTyr.

With reference to FIG. 29 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Met-Leu. An EIC peak is detectedat 25.3 minutes (FIG. 29 a). The MS spectrum shows a m/z peak at263.1±0.1 (FIG. 29 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 29 c). Encircled m/z peak at 104.2±0.1 matches to immonium ion ofMet, referred to as iMet and encircled m/z peak at 86.5±0.1 matches toimmonium ion of Leu referred to as iLeu.

With reference to FIG. 30 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Leu-Ile. An EIC peak is detectedat 25.4 minutes (FIG. 30 a). The MS spectrum shows a m/z peak at245.1±0.1 (FIG. 30 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 30 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofLeu and Ile, respectively referred to as iLeu and iIle.

With reference to FIG. 31 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Tyr-Leu. An EIC peak is detectedat 25.8 minutes (FIG. 31 a). The MS spectrum shows a m/z peak at295.1±0.1 (FIG. 31 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 31 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofLeu, referred to as iLeu and encircled m/z peak at 136.1±0.1 matches toimmonium ion of Tyr referred to as iTyr.

With reference to FIG. 32 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Ile-Leu. An EIC peak is detectedat 26.1 minutes (FIG. 32 a). The MS spectrum shows a m/z peak at245.1±0.1 (FIG. 32 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 32 c). Encircled m/z peak at 86.5±0.1 matches to immonium ions ofIle and Leu, respectively referred to as iIle and iLeu.

With reference to FIG. 33 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Phe-Tyr. An EIC peak is detectedat 26.7 minutes (FIG. 33 a). The MS spectrum shows a m/z peak at329.1±0.1 (FIG. 33 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 33 c). Encircled m/z peak at 120.1±0.1 matches to immonium ion ofPhe, referred to as iPhe and encircled m/z peak at 136.1±0.1 matches toimmonium ion of Tyr referred to as iTyr.

With reference to FIG. 34 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Phe-Met. An EIC peak is detectedat 27.1 minutes (FIG. 34 a). The MS spectrum shows a m/z peak at297.1±0.1 (FIG. 34 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 34 c). Encircled m/z peak at 120.2±0.1 matches to immonium ion ofPhe, referred to as iPhe and encircled m/z peak at 104.3±0.1 matches toimmonium ion of Met referred to as iMet.

With reference to FIG. 35 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Leu-Leu. An EIC peak is detectedat 27.4 minutes (FIG. 35 a). The MS spectrum shows a m/z peak at245.1±0.1 (FIG. 35 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 35 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofLeu referred to as iLeu.

With reference to FIG. 36 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Phe-Ile. An EIC peak is detectedat 28.7 minutes (FIG. 36 a). The MS spectrum shows a m/z peak at279.1±0.1 (FIG. 36 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 36 c). Encircled m/z peak at 120.1±0.1 matches to immonium ion ofPhe, referred to as iPhe and encircled m/z peak at 86.5±0.1 matches toimmonium ion of Ile referred to as iIle.

With reference to FIG. 37 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Tyr-Phe. An EIC peak is detectedat 29.0 minutes (FIG. 37 a). The MS spectrum shows a m/z peak at329.1±0.1 (FIG. 37 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 37 c). Encircled m/z peak at 120.1±0.1 matches to immonium ion ofPhe, referred to as iPhe and encircled m/z peak at 136.1±0.1 matches toimmonium ion of Tyr referred to as iTyr.

With reference to FIG. 38 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Met-Phe. An EIC peak is detectedat 29.5 minutes (FIG. 38 a). The MS spectrum shows a m/z peak at297.0±0.1 (FIG. 38 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 38 c). Encircled m/z peak at 120.2±0.1 matches to immonium ion ofPhe, referred to as iPhe and encircled m/z peak at 104.3±0.1 matches toimmonium ion of Met referred to as iMet.

With reference to FIG. 39 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Ile-Phe. An EIC peak is detectedat 30.2 minutes (FIG. 39 a). The MS spectrum shows a m/z peak at279.1±0.1 (FIG. 39 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 39 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofIle, referred to as iIle and encircled m/z peak at 120.2±0.1 matches toimmonium ion of Phe referred to as iPhe.

With reference to FIG. 40 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Phe-Leu. An EIC peak is detectedat 30.8 minutes (FIG. 40 a). The MS spectrum shows a m/z peak at279.1±0.1 (FIG. 40 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 40 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofLeu, referred to as iLeu and encircled m/z peak at 120.1±0.1 matches toimmonium ion of Phe referred to as iPhe.

With reference to FIG. 41 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Leu-Phe. An EIC peak is detectedat 31.5 minutes (FIG. 41 a). The MS spectrum shows a m/z peak at279.1±0.1 (FIG. 41 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 41 c). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofLeu, referred to as iLeu and encircled m/z peak at 120.2±0.1 matches toimmonium ion of Phe referred to as iPhe.

With reference to FIG. 42 illustrates the EIC and the MS and MS/MSspectra of the chemically-synthesized Phe-Phe. An EIC peak is detectedat 33.4 minutes (FIG. 42 a). The MS spectrum shows a m/z peak at313.1±0.1 (FIG. 42 b). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 42 c). Encircled m/z peak at 120.1±0.1 matches to immonium ion ofPhe referred to as iPhe.

2) Linear Dipeptides Produced in the Presence of Rv2275.

The soluble fraction of E. coli cells expressing Rv2275-his (SEQ IDNO:36) was analyzed by LC-MS as previously described. This analysiswhich leads to one set of EICs was compared to that of the controlexperiment using cells transformed with a vector not coding for a CDS.This comparison showed one significant EIC peak matching with a lineardipeptide and being specific to Rv2275 activity (FIG. 43 and FIG. 44specified in Table IV).

TABLE IV LC-MS/MS analysis of the soluble fraction of E. coli cellsexpressing Rv2275: summary of data extracted from figure whose number isreported herein and identification of linear dipeptide. MS and MS/MSdata EIC immonium See Figure LC Data Identified Peak^(a) m/z iondetected (n^(o)) Tr (min)^(b) dipeptides^(c) 1 345.1 iTyr 44 23.3Tyr-Tyr ^(a)EIC peak listed named according to FIG. 43. ^(b)Tr is theabbreviation for retention time. ^(c)linear dipeptide was definitelyidentified by comparing its retention time, its m/z value and itsfragmentation pattern with those of reference dipeptides (see TableIII).

With reference to FIG. 43 illustrates EICs of dipeptides m/z valuesspecific to Rv2275 and detected from a LCMS analysis of the solublefraction of E. coli cells expressing Rv2275 (upper black traces)compared to the same set of EICs from a LCMS analysis of the controlsample (lower grey traces). The only significant specific EIC peak waslabeled as specified in Table IV for identification by MS and MS/MSillustrated in the FIG. 44.

With reference to FIG. 44 illustrates the MS and MS/MS spectra of theEIC peak 1 detected at 23.3 min during the analysis of the solublefraction of E. coli cells expressing Rv2275. The MS spectrum shows a m/zpeak at 345.1±0.1 not detected in the control sample (FIG. 44 a). Thispeak was isolated as parent ion and subjected to MS/MS fragmentationgiving rise to a daughter ions spectrum (FIG. 44 b). Encircled m/z peakat 136.1±0.1 matches to immonium ion of Tyr referred to as iTyr.

This EIC peak was further characterized by MS/MS fragmentation and theanalysis of the daughter ions spectrum, this enabled the identificationof one potential matching linear dipeptide, namely Tyr-Tyr (Table IV).The comparison of its retention time and its fragmentation pattern withthose of reference chemically-synthesized Tyr-Tyr (see Table III andFIG. 24) allowed the Inventors to conclude that the expression of Rv2275in E. coli cells is responsible for the in vivo formation of Tyr-Tyr(see Table IV).

3) Linear Dipeptides Produced in the Presence of YvmC-Bsub.

The soluble fraction of E. coli cells expressing YvmC-Bsub-his (SEQ IDNO:37) was analyzed by LC-MS as previously described. The analysis whichleads to one set of EICs is compared to that of a control experimentusing cells transformed with a vector not expressing CDS. Thiscomparison enabled the Inventors to detect 12 EIC peaks matching withlinear dipeptides and being specific to the YvmC-Bsub activity (FIG. 45and Figures specified in Table V).

TABLE V LC-MS/MS analysis of the soluble fraction of E. coli cellsexpressing YvmC-Bsub: summary of data extracted from figures whosenumbers are reported herein and identification of linear dipeptides. MSand MS/MS data See EIC immonium Figures LC Data Identified Peaks^(a) M/zions detected (n^(o)) Tr (min)^(b) dipeptides^(c) 1 281.0 iMet 46 20.6Met-Met 2 263.1 iMet, iLeu or iIle 47 21.8 Ile-Met 3 263.0 iMet, iLeu oriIle 48 22.8 Leu-Met 4 263.0 iMet, iLeu or iIle 49 24.9 Met-Leu 5 245.1iLeu or iIle 50 25.4 Leu-Ile 6 245.1 iLeu or iIle 51 25.9 Ile-Leu 7297.0 iMet, iPhe 52 26.8 Phe-Met 8 245.1 iLeu or iIle 53 27.3 Leu-Leu 9297.0 iMet, iPhe 54 29.2 Met-Phe 10 279.1 iPhe, iLeu ou iIle 55 30.8Phe-Leu 11 279.1 iPhe, iLeu ou iIle 56 31.4 Leu-Phe 12 313.1 iPhe 5733.3 Phe-Phe ^(a)EIC peaks are listed by increasing retention timesaccording to FIG. 45. ^(b)Tr is the abbreviation for retention time.^(c)linear dipeptides were definitely identified by comparing theirretention times, their m/z values and their fragmentation patterns withthose of reference dipeptides (see Table III).

With reference to FIG. 45 illustrates EICs of dipeptides m/z valuesspecific to YvmC and detected from a LCMS analysis of the solublefraction of E. coli cells expressing YvmC (upper black traces) comparedto the same set of EICs from a LCMS analysis of the control sample(lower grey traces). A close-up view is made to distinguish the minorproducts detected in the sample. The specific EIC peaks were labeled asspecified in Table V for identification by MS and MS/MS illustrated inthe FIGS. 46 to 57.

With reference to FIG. 46 illustrates the MS and MS/MS spectra of theEIC peak 1 detected at 20.6 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a mainm/z peak at 281.0±0.1 not detected in the control sample (FIG. 46 a).This peak was isolated as parent ion and subjected to MS/MSfragmentation giving rise to a daughter ions spectrum (FIG. 46 b).Encircled m/z peak at 104.3±0.1 matches to immonium ion of Met,respectively referred to as iMet.

With reference to FIG. 47 illustrates the MS and MS/MS spectra of theEIC peak 2 detected at 21.8 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a m/zpeak at 263.1±0.1 not detected in the control sample (FIG. 47 a). Thispeak was isolated as parent ion and subjected to MS/MS fragmentationgiving rise to a daughter ions spectrum (FIG. 47 b). Encircled m/z peakat 86.5±0.1 matches to immonium ion of Leu or Ile, respectively referredto as iLeu or iIle and encircled m/z peak at 104.3±0.1 matches toimmonium ion of Met referred to as iMet.

With reference to FIG. 48 illustrates the MS and MS/MS spectra of theEIC peak 3 detected at 22.8 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a mainm/z peak at 263.0±0.1 (FIG. 48 a). This peak was isolated as parent ionand subjected to MS/MS fragmentation giving rise to a daughter ionsspectrum (FIG. 48 b). Encircled m/z peak at 86.5±0.1 matches to immoniumion of Leu or Ile, respectively referred to as iLeu or iIle andencircled m/z peak at 104.2±0.1 matches to immonium ion of Met referredto as iMet.

With reference to FIG. 49 illustrates the MS and MS/MS spectra of theEIC peak 4 detected at 24.9 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a mainm/z peak at 263.0±0.1 (FIG. 49 a). This peak was isolated as parent ionand subjected to MS/MS fragmentation giving rise to a daughter ionsspectrum (FIG. 49 b). Encircled m/z peak at 104.2±0.1 matches toimmonium ion of Met referred to as iMet and encircled m/z peak at86.5±0.1 matches to immonium ion of Leu or Ile, respectively referred toas iLeu or iIle.

With reference to FIG. 50 illustrates the MS and MS/MS spectra of theEIC peak 5 detected at 25.4 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a m/zpeak at 245.1±0.1 not detected in the control sample (FIG. 50 a). Thispeak was isolated as parent ion and subjected to MS/MS fragmentationgiving rise to a daughter ions spectrum (FIG. 50 b). Encircled m/z peakat 86.5±0.1 matches to immonium ion of Leu or Ile, respectively referredto as iLeu or iIle.

With reference to FIG. 51 illustrates the MS and MS/MS spectra of theEIC peak 6 detected at 25.9 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a mainm/z peak at 245.1±0.1 (FIG. 51 a). This peak was isolated as parent ionand subjected to MS/MS fragmentation giving rise to a daughter ionsspectrum (FIG. 51 b). Encircled m/z peak at 86.5±0.1 matches to immoniumion of Leu or Ile, respectively referred to as iLeu or iIle.

With reference to FIG. 52 illustrates the MS and MS/MS spectra of theEIC peak 7 detected at 26.8 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a mainm/z peak at 297.0±0.1 (FIG. 52 a). This peak was isolated as parent ionand subjected to MS/MS fragmentation giving rise to a daughter ionsspectrum (FIG. 52 b). Encircled m/z peak at 120.2±0.1 matches toimmonium ion of Phe referred to as iPhe and encircled m/z peak at104.3±0.1 matches to immonium ion of Met, respectively referred to asiMet.

With reference to FIG. 53 illustrates the MS and MS/MS spectra of theEIC peak 8 detected at 27.3 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a mainm/z peak at 245.1±0.1 (FIG. 53 a). This peak was isolated as parent ionand subjected to MS/MS fragmentation giving rise to a daughter ionsspectrum (FIG. 53 b). Encircled m/z peak at 86.5±0.1 matches to immoniumion of Leu or Ile, respectively referred to as iLeu or iIle.

With reference to FIG. 54 illustrates the MS and MS/MS spectra of theEIC peak 9 detected at 29.2 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a m/zpeak at 297.0±0.1 (FIG. 54 a). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 54 b). Encircled m/z peak at 120.1±0.1 matches to immonium ion ofPhe referred to as iPhe and encircled m/z peak at 104.2±0.1 matches toimmonium ion of Met, respectively referred to as iMet.

With reference to FIG. 55 illustrates the MS and MS/MS spectra of theEIC peak 10 detected at 30.8 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a m/zpeak at 279.1±0.1 (FIG. 55 a). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 55 b). Encircled m/z peak at 120.1±0.1 matches to immonium ion ofPhe referred to as iPhe and encircled m/z peak at 86.5±0.1 matches toimmonium ion of Leu or Ile, respectively referred to as iLeu or iIle.

With reference to FIG. 56 illustrates the MS and MS/MS spectra of theEIC peak 11 detected at 31.4 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a m/zpeak at 279.1±0.1 (FIG. 56 a). This peak was isolated as parent ion andsubjected to MS/MS fragmentation giving rise to a daughter ions spectrum(FIG. 56 b). Encircled m/z peak at 86.5±0.1 matches to immonium ion ofLeu or Ile, respectively referred to as iLeu or iIle and encircled m/zpeak at 120.1±0.1 matches to immonium ion of Phe referred to as iPhe.

With reference to FIG. 57 illustrates the MS and MS/MS spectra of theEIC peak 12 detected at 33.3 min during the analysis of the solublefraction of E. coli cells expressing YvmC. The MS spectrum shows a minorm/z peak at 313.1±0.1 not detected in the control sample (FIG. 57 a).This peak was isolated as parent ion and subjected to MS/MSfragmentation giving rise to a daughter ions spectrum (FIG. 57 b).Encircled m/z peak at 120.1±0.1 matches to immonium ion of Phe referredto as iPhe.

All these EIC peaks, except peak 1, peak 7, peak 9 and peak 12,correspond to linear dipeptides containing the isomass leucyl orisoleucyl residues (Table V and figures numbered herein).

Finally, the comparison of the retention times and fragmentationpatterns of the 12 linear dipeptides with those of referencechemically-synthesized dipeptides (see Table III and figures numberedherein) allowed the Inventors to conclude that the expression ofYvmC-Bsub in E. coli cells is responsible for the in vivo formation ofthe following dipeptides: Ile-Met, Leu-Met, Met-Leu, Leu-Ile, Ile-Leu,Leu-Leu, Phe-Leu, Leu-Phe, Phe-Phe, Met-Met, Phe-Met and Met-Phe (seeTable V). The two possible sequences of each detected linear dipeptideswere always observed except for Ile-Met as its counterpart Met-Ile wasnot identified. It is reasonably supposed that Met-Ile was also producedby YvmC-Bsub but its quantity was too small to be detected.

In conclusion, the three tested CDSs (namely AlbC, Rv2275 and YvmC-Bsub)can be used to produce linear dipeptides when introduced in bacterialcells such as E. coli cells. However all CDSs which meet the criteriaspecified above are able to direct the in vivo synthesis of lineardipeptides.

EXAMPLE 3 Isolation of a New CDS Coding Sequence by a PCR-Based Approach

As indicated previously Streptomyces noursei and Streptomyces albulussynthesize albonoursin. Streptomyces sp IMI 351155 has been reported tosynthesize 1-N-methylalbonoursin (Biosynthesis of 1-N-methylalbonoursinby an endophytic Streptomyces sp. Isolated from perennial ryegrass,Gurney and Mantle, J. Nat. Prod. 1993, 56:1194-1198). The Inventors havealso found that this strain produces albonoursin, in addition to1-N-methylalbonoursin.

The Inventors sought to identify the existence of one or more CDShomologous genes in this strain.

The Inventors first performed hybridization experiments under stringentor non stringent conditions, but these did not allow them to detect anyfragment in the genomic DNA of Streptomyces sp IMI 351155 hybridizingwith a probe corresponding to the gene albC, or with probescorresponding to other alb genes (e.g. albA and albB,) from Streptomycesnoursei.

It should be noted that the same type of hybridization experimentsperformed with total genomic DNA of Streptomyces albulus revealed DNAfragments hybridizing under stringent conditions. Further isolation andcharacterization of these fragments from Streptomyces albulus genomicDNA confirmed that they contained the genes directing albonoursin andlinear dipeptide biosynthesis.

A Polymerase Chain Reaction (PCR) based approach was therefore developedto find and isolate the albC homologue from Streptomyces sp IMI 351155,i.e. the gene responsible for linear dipeptide biosynthesis.

To design the primers for this PCR-based reaction, the Inventors usedthe two regions containing the conserved amino acid motifs in all theknow CDSs, corresponding to SEQ ID NO:9 and SEQ ID NO:10. However tolimit the degeneracy of the primers, the Inventors took into account thepartial conservation at some positions, even if this was not taken inaccount in the definition of the signature H-X-[LVI]-[LVI]-G-[LVI]-S(SEQ ID NO:9) and Y-[LVI]-X-X-E-X-P (SEQ ID NO:10).

The primers were designed from the sequencesH-[LVA]-[LVI]-[LVI]-G-[VI]-S (SEQ ID NO:24) andY-[VI]-[LICF]-[AD]-E-[ALI]-P-[LFA]-[FY] (SEQ ID NO:25, see FIGS. 59 and60).

A part of the alignment of all CDSs sequences in the first motif areshown in FIG. 59 and the region used for primer design is indicated by aline under the alignment. The numbering is that of AlbC from S. noursei.The degenerated amino acid sequence is shown with the correspondingnucleotide sequence. The first primer was finalised as:

5′ CAC BYS NTS NTS GGS RTS WSS SC (SEQ ID NO: 22)

In which for nucleotide: B=C or G or T, N=A or C or G or T, R=A or G,S=C or G, W=A or T, Y=C or T.

A part of the alignment of all CDSs sequences in the second motif areshown in FIG. 60 and the region used for primer design is indicated by aline under the alignment. The numbering is that of AlbC from S. noursei.The degenerated amino acid sequence is shown with the correspondingnucleotide sequence, and the complementary strand (at the bottom) usedas primer. The second primer was finalized as:

(SEQ ID NO: 23) 5′ ATG YAS DMS CKS CTC NRS GGS MRS AWG

In which for nucleotide: D=A or G or T, K=G or T, M=A or C, N=A or C orG or T, R=A or G, S=C or G, W=A or T, Y=C or T.

To reduce the degeneracy of the primers, the codon usage of Streptomyceswas taken into account. As the genomic DNA of Streptomyces is GC rich,the third position in all codons is preferentially a C or G. Therefore,in the primers, all nucleotides corresponding to the third position in acodon were modified to either C or G, for example residues in the primerY became C, and residues N became S). The two degenerated primers usedwere Primer 1 5′-CACBYSNTSNTSGGSRTSWSSSC-3′ (SEQ ID NO:26) and Primer 25′-GWASRMSGGSRNCTCSKCSMDSAYGTA-3′ (SEQ ID NO:27).

PCR using these primers was performed on cDNA obtained by reversetranscription of the total RNA extracted from Streptomyces sp. IMI351155 after 3 days of cultivation in HT medium. This time ofcultivation correspond to the onset of dipeptide biosynthesis, a timewhere the dipeptide biosynthetic genes should be transcribed. Total RNAwas extracted using well established protocols and cDNAs were obtainedusing the kit SuperScript® First-Strand Synthesis System for RT-PCR fromInvitrogen.

To enhance the specificity of the PCR reaction, ramping PCR conditionswere used as follows: after an initial denaturation step at 95° C. for 2min, the annealing temperature was initially 37° C., and it wasincreased to 72° C. in steps of 1° C. every 15 s. This was followed bydenaturation at 95° C. for 30 s. Two such cycles were performed. Thenthe PCR program consisted of 35 cycles of 95° C. for 30 s, 55° C. for 1min 30 s and 72° C. for 1 min. Taq polymerase was used.

The PCR products obtained were separated by agarose gel electrophoresis.A faint band of about 470 by was visible. DNA in the range 450-500 bywas extracted from the gel and a fraction was used as template for PCRamplification with primer 1 and 2. The PCR program consisted of aninitial denaturation step at 95° C. for 2 min, followed by 35 cycles of95° C. for 30 s, 55° C. for 1 min 30 s and 72° C. for 1 min. Taqpolymerase was used. The PCR products were separated by agarose gelelectrophoresis. A band of about 470 by was clearly visible. This bandwas extracted from the gel and ligated to the vector pGEMT-Easy(Promega). The ligation mix was used to transform competent E. colicells. Plasmids were extracted from nine clones and the nucleotidesequence of their inserts was determined. All the inserts were verysimilar, the differences between them being in the region correspondingto the two degenerated primers. The deduced products were similar toAlbC from Streptomyces noursei (42% identity in amino acids).

To obtain the complete albC homolgue from Streptomyces sp. IMI351155(called thereafter albC-IMI), a gene library of the genomic DNA fromStreptomyces sp. IMI351155 was constructed in the cosmid pWED2 (Karrayet al. 2007, Organization of the biosynthetic gene cluster for themacrolide antibiotic spiramycin in Streptomyces ambofaciens,Microbiology, in press). The cloned PCR fragment, corresponding to partof the albC-IMI gene, was used as a probe in a colony hybridizationexperiment. This led to the isolation of 4 clones which hybridizedstrongly with the probe. The cosmids that they contained were extractedand shown to have fragments in their inserts which hybridized with thealbC-IMI probe.

These fragments were subcloned and their nucleotide sequences weredetermined. This led to the characterization of three genes albA-IMI,albB-IMI and albC-IMI encoding proteins which present respectively 51%;50% and 40% amino acid identity with AlbA, AlbB and AlbC fromStreptomyces noursei.

1-34. (canceled)
 35. A method for the production of a linear dipeptide,characterized in that comprising the steps: a) culturing upon a medium ahost cell which has the ability to produce a protein or an activefragment thereof having the activity to form a linear dipeptide from oneor more kinds of amino acids; b) allowing said linear dipeptide to formand accumulate in said host cell and optionally in said medium; c)recovering said linear dipeptide from an extract of said host cell andoptionally said medium; wherein said protein or an active fragmentthereof is selected in the group consisting of proteins and fragmentsthereof, having at least 20% identity and no more than 90% identity withSEQ ID NO:1.
 36. A method for the production of linear dipeptide,according to claim 35, wherein said protein or an active fragmentthereof is encoded by an endogenous gene of said host cell.
 37. A methodfor the production of linear dipeptide, according to claim 35, whereinsaid protein or an active fragment thereof is not encoded by anendogenous gene of said host cell.
 38. A method for the production oflinear dipeptide, according to claim 35, wherein said host cellcomprises coding sequences for at least two proteins or active fragmentsthereof.
 39. A method for the production of linear dipeptide, accordingto claim 35, wherein said at least two coding sequences come fromdifferent genes.
 40. A method for the production of linear dipeptide,according to claim 35, wherein said at least two coding sequences comefrom a single gene.
 41. A method for the production of linear dipeptideaccording to claim 35, wherein said protein or an active fragmentthereof has at least 20% and no more than 35% identity with SEQ ID NO:1.42. A method for the production of linear dipeptide, according to claim35, wherein said protein or an active fragment thereof comprises a firstconserved amino acid sequence of the general sequence SEQ ID NO:9:H-X-[LVI]-[LVI]-G-[LVI]-S (SEQ ID NO:9) wherein H=histidine, X=any aminoacid, [LVI]=any one of leucine, valine or isoleucine, G=glycine andS=serine.
 43. A method for the production of linear dipeptide, accordingto claim 35, wherein said protein or an active fragment thereofcomprises a second conserved amino acid sequence of the general sequenceSEQ ID NO:10: Y-[LVI]-X-X-E-X-P (SEQ ID NO: 10)

wherein Y=tyrosine, [LVI]=any one of leucine, valine or isoleucine,X=any amino acid, E=glutamic acid and P=proline.
 44. A method for theproduction of linear dipeptide, according to claim 42, wherein saidfirst conserved amino acid sequence and said second amino acid sequenceare separated by at least 120 amino acid residues and no more than 160amino acid residues.
 45. A method for the production of lineardipeptide, according to claim 43, wherein said first conserved aminoacid sequence and said second amino acid sequence are separated by atleast 140 amino acid residues and no more than 150 amino acid residues.46. A method for the production of linear dipeptide, according to claim42, wherein said first conserved amino acid sequence corresponds toresidues 31 to 37 of SEQ ID NO:1.
 47. A method for the production oflinear dipeptide, according to claim 43, wherein said second conservedamino acid sequence corresponds to residues 178 to 184 of SEQ ID NO:1.48. A method for the production of linear dipeptide, according to claim35, wherein said protein or an active fragment thereof was isolated froma microorganism belonging to the genus Bacillus, Corynebacterium,Mycobacterium, Streptomyces, Photorhabdus or Staphylococcus.
 49. Amethod for the production of linear dipeptide, according to claim 35,wherein said protein or an active fragment thereof was isolated from amicroorganism selected from the list Bacillus licheniformis, Bacillussubtilis subsp. subtilis, Bacillus thuringiensis serovar israelensis,Photorhabdus luminescens subsp. laumondii, Staphylococcus haemolyticus,Corynebacterium jeikeium, Mycobacterium tuberculosis, Mycobacteriumbovis or Mycobacterium bovis BCG.
 50. A method for the production oflinear dipeptide, according to claim 35, wherein said protein or anactive fragment thereof is selected from the group consisting of AlbC(SEQ ID NO:1), Rv2275 (SEQ ID NO:2), MT2335 (SEQ ID NO:2), MRA2294 (SEQID NO:2), TBFG12300 (SEQ ID NO:2), Mb2298 (SEQ ID NO:2), BCG2292 (SEQ IDNO:34), YvmC-Bsub (SEQ ID NO:3), YvmClic (SEQ ID NO:4), YvmC-Bthu (SEQID NO:5), pSHaeCO06 (SEQ ID NO:6), Plu0297 (SEQ ID NO:7), JK0923 (SEQ IDNO:8), AlbC-his (SEQ ID NO:35), Rv2275-his (SEQ ID NO:36), YvmC-Bsub-his(SEQ ID NO:37).
 51. A method for the production of linear dipeptide,according to claim 35, wherein said linear dipeptide is selected fromthe group: Phe-Leu, Leu-Phe, Phe-Phe, Phe-Tyr, Tyr-Phe, Leu-Leu,Leu-Tyr, Tyr-Leu, Phe-Met, Met-Phe, Leu-Met, Met-Leu, Tyr-Met, Met-Tyr,Met-Met, Tyr-Tyr, Ile-Met, Met-Ile, Leu-Ile, Ile-Leu.
 52. A method forthe production of linear dipeptide, wherein said protein or an activefragment thereof is encoded by an isolated, natural or synthetic nucleicacid sequence coding selected from the group consisting of SEQ ID NO:11,SEQ ID NO:12, SEQ ID NO:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID NO:16,SEQ ID NO:20, SEQ ID NO:21, positions 114-861 of SEQ ID NO:17, positions114-1008 of SEQ ID NO:18 and positions 114-885 of SEQ ID NO:19.
 53. Arecombinant vector comprising a nucleic acid coding sequence as claimedin claim 52, wherein said vector is configured to introduce said nucleicacid coding sequence into at least one host cell and said codingsequence is thereby expressed by the endogenous expression mechanisms ofsaid host cell.
 54. A recombinant vector comprising a nucleic acidcoding sequence as claimed in claim 53, wherein said recombinant vectoris selected from the group comprising SEQ ID NO:17, SEQ ID NO:18 and SEQID NO:19.
 55. A recombinant vector, as claimed in claim 53, wherein saidrecombinant vector comprises coding sequences for at least two proteinsor active fragments thereof.
 56. A recombinant vector, as claimed inclaim 53, wherein said at least two coding sequences come from differentgenes.
 57. A recombinant vector, as claimed in claim 53, wherein said atleast two coding sequences come from a single gene.
 58. A recombinantvector, as claimed in claim 53, wherein said host cell is a prokaryote.59. A recombinant vector, as claimed in claim 53, wherein said host cellis Escherichia coli.
 60. A recombinant vector comprising said nucleicacid coding sequence as claimed in claim 52, wherein said vector isconfigured to express said nucleic acid coding sequence in a cell freeexpression system by the endogenous transcription mechanisms of saidcell free expression system.
 61. A method for the production of a lineardipeptide, characterized in that it comprises the steps: a) inducing acell free expression system to produce a protein or an active fragmentthereof, having the activity to form a dipeptide from one or more kindsof amino acids; b) introducing at least one amino acid substrate to saidprotein or an active fragment thereof; c) allowing said dipeptide toform and accumulate; d) recovering said dipeptide; wherein said proteinor an active fragment thereof is selected in the group consistingproteins and fragments thereof, having at least 20% identity and no morethan 90% identity with SEQ ID NO:1.
 62. A method of identifyingpolypeptides that catalyse the formation of a linear dipeptide of thegeneral formula (i):R¹-R²  (i) (wherein R¹ and R², which may be the same or different andeach may represent any amino acid); characterized in that it comprisesthe steps: a) identifying a candidate polypeptide sequence as having atleast one of the following motifs: H-X-[LVI]-[LVI]-G-[LVI]-S (SEQ ID NO:9)

wherein H=histidine, X=any amino acid, [LVI]=any one of leucine, valineor isoleucine, G=glycine and S=serine; and wherein at least one of saidH, LVI, G or S can be another amino acid namely H can be replaced by anyone of Lysine or Arginine; LVI can be replaced by any one of Glycine,Alanine, Leucine, Valine or Isoleucine; G can be replaced by any one ofGlycine, Alanine, Leucine, Valine or Isoleucine; S can be replaced byCysteine, Threonine or Methionine. Y-[LVI]-X-X-E-X-P (SEQ ID NO: 10)

wherein Y=tyrosine, [LVI]=any one of leucine, valine or isoleucine,X=any amino acid, E=glutamic acid and P=proline; and wherein at leastone of said Y, LVI, E, X or P can be another amino acid namely Y can bereplaced by any one of Phenylalanine or Trytophan; LVI can be replacedby any one of Glycine, Alanine, Leucine, Valine or Isoleucine; E can bereplaced by any one of Aspartic Acid, Asparagine, Glutamine; P can bereplaced by any one of Glycine, Alanine, Leucine, Valine or Isoleucine;b) creating a polypeptide expression construct by linking said candidatepolypeptide coding sequence to promoter sequences configured to expresssaid candidate peptide at an appreciable level; c) introducing saidpolypeptide expression construct into at least one cell and inducing thetake up of said polypeptide expression construct by said at least onecell or a cell free expression system; d) monitoring the levels andtypes of linear dipeptides in the growth medium of said at least onecell or said cell free expression system; e) comparing the levels oflinear dipeptides in the presence of said polypeptide expressionconstruct to the levels of linear dipeptides in the absence of saidpolypeptide expression construct to determine the relative level ofproduction of linear dipeptides by said polypeptide expressionconstruct; and f) correlating the relative production of lineardipeptides to expression of said candidate polypeptide in said at leastone cell or said cell free expression system.
 63. A method ofidentifying polypeptides that catalyse the formation of a lineardipeptide of the general formula (i):R¹-R²  (i) (wherein R¹ and R², which may be the same or different andeach may represent any amino acid); characterized in that it comprisesthe steps: a) identifying a candidate polypeptide sequence as havingboth of the following motifs: H-X-[LVI]-[LVI]-G-[LVI]-S (SEQ ID NO: 9)

wherein H=histidine, X=any amino acid, [LVI]=any one of leucine, valineor isoleucine, G=glycine and S=serine; and wherein at least one of saidH, LVI, G or S can be another amino acid namely H can be replaced by anyone of Lysine or Arginine; LVI can be replaced by any one of Glycine,Alanine, Leucine, Valine or Isoleucine; G can be replaced by any one ofGlycine, Alanine, Leucine, Valine or Isoleucine; S can be replaced byCysteine, Threonine or Methionine. Y-[LVI]-X-X-E-X-P (SEQ ID NO: 10)

wherein Y=tyrosine, [LVI]=any one of leucine, valine or isoleucine,X=any amino acid, E=glutamic acid and P=proline; and wherein at leastone of said Y, LVI, E, X or P can be another amino acid namely Y can bereplaced by any one of Phenylalanine or Trytophan; LVI can be replacedby any one of Glycine, Alanine, Leucine, Valine or Isoleucine; E can bereplaced by any one of Aspartic Acid, Asparagine, Glutamine; P can bereplaced by any one of Glycine, Alanine, Leucine, Valine or Isoleucine;b) creating a polypeptide expression construct by linking said candidatepolypeptide coding sequence to promoter sequences configured to expresssaid candidate peptide at an appreciable level; c) introducing saidpolypeptide expression construct into at least one cell and inducing thetake up of said polypeptide expression construct by said at least onecell or a cell free expression system; d) monitoring the levels andtypes of linear dipeptides in the growth medium of said at least onecell or said cell free expression system; e) comparing the levels oflinear dipeptides in the presence of said polypeptide expressionconstruct to the levels of linear dipeptides in the absence of saidpolypeptide expression construct to determine the relative level ofproduction of linear dipeptides by said polypeptide expressionconstruct; and f) correlating the relative production of lineardipeptides to expression of said candidate polypeptide in said at leastone cell or said cell free expression system.
 64. A method foridentifying polypeptides according to claim 63, wherein said firstconserved motif (SEQ ID NO:9) and said second conserved motif (SEQ IDNO:10) are separated by at least 75 and no more than 250 amino acids.65. A method for identifying polypeptides according to claim 63, whereinsaid first conserved motif (SEQ ID NO:9) and/or said second conservedmotif (SEQ ID NO:10) comprise more than one residue change.
 66. A methodfor identifying polypeptides according to claim 63, wherein step a) ofsaid method comprises the amplification of candidate peptide codingnucleic acid sequences using degenerated primers of SEQ ID NO:22 and SEQID NO:23 in a Polymerase Chain Reaction.
 67. A method of identifyingpolypeptides that catalyse the formation of a linear dipeptide of thegeneral formula (i):R¹-R²  (i) wherein R¹ and R², which may be the same or different andeach may represent any amino acid; characterized in that it comprisesthe steps: a) identifying a candidate polypeptide sequence as having atleast 20% identity and no more than 90% identity with SEQ ID NO:1; orhaving at least 20% identity with any one of SEQ ID NO:2, SEQ ID NO:3,SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ IDNO:35, SEQ ID NO:36, SEQ ID NO:37; b) creating a polypeptide expressionconstruct by linking said candidate polypeptide sequence to promotersequences configured to express said candidate peptide at an appreciablelevel; c) introducing said polypeptide expression construct into atleast one cell and inducing the take up of said polypeptide expressionconstruct by said at least one cell or a cell free expression system; d)monitoring the levels and types of linear dipeptides in the growthmedium of said at least one cell or said cell free expression system; e)comparing the levels of linear dipeptides in the presence of saidpolypeptide expression construct to the levels of linear dipeptides inthe absence of said polypeptide expression construct to determine therelative level of production of linear dipeptides by said polypeptideexpression construct; and f) correlating the relative production oflinear dipeptides to expression of said candidate polypeptide in said atleast one cell or said cell free expression system.